You have an Airflow DAG that runs dbt models on Snowflake every day at midnight. The DAG has three tasks: extract, transform, and load. The transform task runs dbt models using the dbt run command. What will happen if the extract task fails?
dag = DAG('daily_dbt', schedule_interval='0 0 * * *') extract = BashOperator(task_id='extract', bash_command='python extract.py', dag=dag) transform = BashOperator(task_id='transform', bash_command='dbt run --profiles-dir ./profiles', dag=dag) load = BashOperator(task_id='load', bash_command='python load.py', dag=dag) extract >> transform >> load
Think about how Airflow handles task dependencies and failures.
In Airflow, if a task fails, downstream tasks that depend on it will not run. Since transform depends on extract, it will not run if extract fails.
You want to build a data pipeline using Airflow to orchestrate dbt models on Snowflake. Which architecture ensures that dbt models only run after the raw data is fully loaded into Snowflake?
Think about how to enforce order in task execution.
Setting explicit task dependencies in Airflow ensures that dbt models run only after the data load task completes successfully.
You need to securely manage Snowflake credentials used by dbt in an Airflow environment. Which approach follows best security practices?
Consider encryption and avoiding hardcoding secrets.
Storing credentials encrypted in Airflow Variables and referencing them securely in dbt profiles avoids exposing secrets in code or logs.
You want to run dbt models from Airflow using the dbt run command. Which profiles.yml configuration snippet correctly sets up a Snowflake connection for Airflow?
profiles.yml snippet:
Consider how Airflow passes environment variables securely to tasks.
Using env_var in dbt profiles allows Airflow to inject credentials securely as environment variables, which is a best practice.
You want to optimize your Airflow orchestrated dbt pipeline on Snowflake to reduce compute costs without sacrificing data freshness. Which strategy is best?
Think about balancing cost and data freshness with scheduling and warehouse settings.
Auto-suspend warehouses save costs by shutting down when idle. Using Airflow sensors to trigger runs only on new data avoids unnecessary compute usage.