Challenge - 5 Problems

🎖️

dbt-Airflow Snowflake Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ service_behavior

intermediate

2:00remaining

Understanding Airflow DAG Scheduling with dbt Tasks

You have an Airflow DAG that runs dbt models on Snowflake every day at midnight. The DAG has three tasks: extract, transform, and load. The transform task runs dbt models using the dbt run command. What will happen if the extract task fails?

Snowflake

dag = DAG('daily_dbt', schedule_interval='0 0 * * *')
extract = BashOperator(task_id='extract', bash_command='python extract.py', dag=dag)
transform = BashOperator(task_id='transform', bash_command='dbt run --profiles-dir ./profiles', dag=dag)
load = BashOperator(task_id='load', bash_command='python load.py', dag=dag)
extract >> transform >> load

AAll tasks will run in parallel ignoring dependencies.

BThe transform task will not run because Airflow stops the DAG on task failure.

CThe load task will run but the transform task will be skipped.

DThe transform task will run regardless of the extract task status.

Attempts:

2 left

❓ Architecture

intermediate

2:00remaining

Designing a Reliable dbt and Airflow Integration on Snowflake

You want to build a data pipeline using Airflow to orchestrate dbt models on Snowflake. Which architecture ensures that dbt models only run after the raw data is fully loaded into Snowflake?

AUse Airflow to run dbt models and load data in parallel to save time.

BSchedule dbt models to run on a fixed time daily without checking data load status.

CCreate an Airflow DAG where the load task runs first, then the transform task runs dbt models, with explicit task dependencies.

DRun dbt models manually after loading data into Snowflake.

Attempts:

2 left

❓ security

advanced

2:00remaining

Securing Credentials for dbt and Airflow Integration

You need to securely manage Snowflake credentials used by dbt in an Airflow environment. Which approach follows best security practices?

AStore Snowflake credentials in Airflow Variables encrypted with a key and reference them in dbt profiles.

BHardcode Snowflake credentials directly in the dbt profiles.yml file checked into Git.

CPass Snowflake credentials as plain text environment variables in Airflow tasks.

DUse the same Snowflake user credentials for all Airflow and dbt tasks without rotation.

Attempts:

2 left

❓ Configuration

advanced

2:00remaining

Configuring dbt Profiles for Airflow Execution on Snowflake

You want to run dbt models from Airflow using the dbt run command. Which profiles.yml configuration snippet correctly sets up a Snowflake connection for Airflow?

Snowflake

profiles.yml snippet:

my_profile:
  target: prod
  outputs:
    prod:
      type: snowflake
      account: 'xy12345'
      user: '{{ var.value.snowflake_user }}'
      password: '{{ var.value.snowflake_password }}'
      role: 'ANALYST'
      database: 'MY_DB'
      warehouse: 'MY_WH'
      schema: 'PUBLIC'
      threads: 1
      client_session_keep_alive: false

my_profile:
  target: prod
  outputs:
    prod:
      type: snowflake
      account: 'xy12345'
      user: 'airflow'
      password: 'password123'
      role: 'ANALYST'
      database: 'MY_DB'
      warehouse: 'MY_WH'
      schema: 'PUBLIC'
      threads: 4
      client_session_keep_alive: true

my_profile:
  target: prod
  outputs:
    prod:
      type: snowflake
      account: 'xy12345'
      user: 'dbt_user'
      password: '{{ var.value.snowflake_password }}'
      role: 'ANALYST'
      database: 'MY_DB'
      warehouse: 'MY_WH'
      schema: 'PUBLIC'
      threads: 1
      client_session_keep_alive: true

my_profile:
  target: dev
  outputs:
    dev:
      type: snowflake
      account: 'xy12345'
      user: '{{ env_var("SNOWFLAKE_USER") }}'
      password: '{{ env_var("SNOWFLAKE_PASSWORD") }}'
      role: 'ANALYST'
      database: 'MY_DB'
      warehouse: 'MY_WH'
      schema: 'PUBLIC'
      threads: 2
      client_session_keep_alive: false

Attempts:

2 left

✅ Best Practice

expert

2:00remaining

Optimizing Airflow and dbt Pipeline for Snowflake Cost Efficiency

You want to optimize your Airflow orchestrated dbt pipeline on Snowflake to reduce compute costs without sacrificing data freshness. Which strategy is best?

AUse Snowflake warehouses with auto-suspend enabled and set Airflow schedules to run only when new data arrives using sensors.

BKeep Snowflake warehouses running 24/7 to avoid startup delays and schedule dbt runs every 15 minutes.

CRun all dbt models daily regardless of data changes to ensure freshness.

DUse the largest Snowflake warehouse size to speed up dbt runs and reduce runtime.

Attempts:

2 left