0
0
dbtdata~30 mins

Orchestrating dbt with Airflow - Mini Project: Build & Apply

Choose your learning style9 modes available
Orchestrating dbt with Airflow
📖 Scenario: You work as a data analyst in a company that uses dbt (data build tool) to transform raw data into clean, usable tables. To automate these transformations, your team uses Apache Airflow, a tool that schedules and runs workflows.Today, you will create a simple Airflow DAG (Directed Acyclic Graph) to run dbt commands automatically. This will help your team keep data fresh without manual work.
🎯 Goal: Build an Airflow DAG that runs dbt commands to build and test your data models. You will create the DAG, configure the dbt commands, and print the results.
📋 What You'll Learn
Create a Python dictionary called default_args with Airflow DAG default settings.
Create a variable called dbt_run_command with the dbt run shell command as a string.
Create a Python function called run_dbt that prints the dbt run command.
Create an Airflow DAG named dbt_dag that uses the run_dbt function as a task.
Print the task id of the dbt run task.
💡 Why This Matters
🌍 Real World
Automating dbt runs with Airflow helps data teams keep data models updated daily without manual intervention.
💼 Career
Data engineers and analysts often use Airflow to schedule and monitor dbt workflows in production data pipelines.
Progress0 / 4 steps
1
Set up Airflow DAG default arguments
Create a Python dictionary called default_args with these exact entries: 'owner': 'airflow', 'start_date': datetime(2024, 1, 1), and 'retries': 1. Import datetime from the datetime module first.
dbt
Need a hint?

Use default_args = {'owner': 'airflow', 'start_date': datetime(2024, 1, 1), 'retries': 1}.

2
Create the dbt run command string
Create a variable called dbt_run_command and set it to the string 'dbt run --profiles-dir ./profiles' exactly.
dbt
Need a hint?

Set dbt_run_command = 'dbt run --profiles-dir ./profiles'.

3
Define the dbt run function and Airflow DAG
Import DAG from airflow and PythonOperator from airflow.operators.python. Define a function called run_dbt that prints the dbt_run_command. Then create an Airflow DAG called dbt_dag with default_args and schedule_interval='@daily'. Inside the DAG context, create a PythonOperator task called dbt_run_task that runs the run_dbt function.
dbt
Need a hint?

Use with DAG('dbt_dag', default_args=default_args, schedule_interval='@daily') as dag: and create dbt_run_task = PythonOperator(task_id='dbt_run', python_callable=run_dbt).

4
Print the task id of the dbt run task
Write a print statement to display the task_id of the dbt_run_task.
dbt
Need a hint?

Use print(dbt_run_task.task_id) to show the task id.