0
0
AirflowHow-ToBeginner ยท 3 min read

How to Use set_downstream in Airflow for Task Dependencies

In Airflow, use set_downstream on a task to specify which tasks should run after it. For example, task1.set_downstream(task2) means task2 runs after task1. This helps control the order of task execution in your DAG.
๐Ÿ“

Syntax

The set_downstream method is called on a task object and takes one or more task objects as arguments. It sets the given tasks to run after the current task.

  • task.set_downstream(other_task): Makes other_task run after task.
  • You can pass a single task or a list of tasks.
python
task1.set_downstream(task2)
task1.set_downstream([task2, task3])
๐Ÿ’ป

Example

This example shows how to create two tasks and use set_downstream to make the second task run after the first.

python
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime

default_args = {
    'start_date': datetime(2024, 1, 1),
}

dag = DAG('example_set_downstream', default_args=default_args, schedule_interval='@once')

# Define tasks
task1 = BashOperator(
    task_id='task1',
    bash_command='echo "Task 1 running"',
    dag=dag
)

task2 = BashOperator(
    task_id='task2',
    bash_command='echo "Task 2 running"',
    dag=dag
)

# Set task2 to run after task1
task1.set_downstream(task2)
Output
When the DAG runs, the output will be: Task 1 running Task 2 running This shows task2 runs only after task1 completes.
โš ๏ธ

Common Pitfalls

Common mistakes when using set_downstream include:

  • Calling set_downstream on the wrong task, reversing the order.
  • Not passing a task or list of tasks, causing errors.
  • Mixing set_downstream with other dependency methods inconsistently.

Always ensure the task you call set_downstream on is the one that should run first.

python
from airflow.operators.bash import BashOperator

# Wrong order (task2 runs before task1, which is incorrect)
task2.set_downstream(task1)  # This reverses the intended order

# Correct order
task1.set_downstream(task2)
๐Ÿ“Š

Quick Reference

MethodDescriptionExample
set_downstreamSets tasks to run after the current tasktask1.set_downstream(task2)
set_upstreamSets tasks to run before the current tasktask2.set_upstream(task1)
Bitshift operator >>Newer syntax for downstream dependencytask1 >> task2
Bitshift operator <<Newer syntax for upstream dependencytask2 << task1
โœ…

Key Takeaways

Use set_downstream to specify tasks that run after the current task in Airflow.
Call set_downstream on the task that should run first, passing the next task(s) as argument(s).
You can pass a single task or a list of tasks to set_downstream.
Avoid reversing task order by calling set_downstream on the wrong task.
Consider using the newer >> operator as a clearer alternative to set_downstream.