0
0
Apache Airflowdevops~30 mins

ExternalTaskSensor for cross-DAG dependencies in Apache Airflow - Mini Project: Build & Apply

Choose your learning style9 modes available
ExternalTaskSensor for cross-DAG dependencies
📖 Scenario: You are managing workflows in Apache Airflow. You have two DAGs: dag_a and dag_b. dag_b should wait for a specific task in dag_a to complete before starting its own tasks. This is a common real-world need when workflows depend on each other.
🎯 Goal: Build an Airflow DAG dag_b that uses ExternalTaskSensor to wait for the task task_in_dag_a in dag_a to finish before running its own task.
📋 What You'll Learn
Create a DAG called dag_b with default arguments
Use ExternalTaskSensor to wait for task_in_dag_a in dag_a
Add a simple PythonOperator task called task_in_dag_b that runs after the sensor
Print a message in task_in_dag_b to confirm it runs after the sensor
💡 Why This Matters
🌍 Real World
In real projects, workflows often depend on other workflows. ExternalTaskSensor helps coordinate these dependencies safely.
💼 Career
Understanding cross-DAG dependencies is essential for Airflow users and DevOps engineers managing complex data pipelines.
Progress0 / 4 steps
1
Create the DAG dag_b with default arguments
Write code to import DAG from airflow and create a DAG called dag_b with start_date set to datetime(2024, 1, 1) and schedule_interval set to @daily. Use default_args with owner set to 'airflow'.
Apache Airflow
Need a hint?

Use from airflow import DAG and datetime from datetime module. Define default_args as a dictionary with 'owner': 'airflow'. Then create dag_b with these arguments.

2
Add an ExternalTaskSensor to wait for task_in_dag_a in dag_a
Import ExternalTaskSensor from airflow.sensors.external_task. Create a task called wait_for_task_in_dag_a using ExternalTaskSensor that waits for the task task_in_dag_a in DAG dag_a. Set dag parameter to dag_b.
Apache Airflow
Need a hint?

Use ExternalTaskSensor with task_id='wait_for_task_in_dag_a', external_dag_id='dag_a', and external_task_id='task_in_dag_a'. Pass dag=dag_b to assign the task to the DAG.

3
Add a PythonOperator task called task_in_dag_b
Import PythonOperator from airflow.operators.python. Define a Python function called print_message that prints 'Task in dag_b is running'. Create a PythonOperator task named task_in_dag_b that calls print_message and belongs to dag_b. Set task_in_dag_b to run after wait_for_task_in_dag_a.
Apache Airflow
Need a hint?

Define a function print_message that prints the message. Use PythonOperator with task_id='task_in_dag_b' and python_callable=print_message. Use wait_for_task_in_dag_a >> task_in_dag_b to set the order.

4
Print confirmation message when task_in_dag_b runs
Run the DAG dag_b and observe the output of task_in_dag_b. It should print Task in dag_b is running after wait_for_task_in_dag_a completes.
Apache Airflow
Need a hint?

Trigger dag_b in Airflow UI or CLI. Check the logs of task_in_dag_b to see the printed message.