How to Set Task Dependencies in Airflow: Simple Guide
In Airflow, you set task dependencies using the
set_upstream() and set_downstream() methods or the bitshift operators >> (downstream) and << (upstream). These methods define the order tasks run by linking them in your DAG code.Syntax
Airflow allows you to set task dependencies in two main ways:
- Using methods:
task1.set_downstream(task2)meanstask2runs aftertask1. - Using operators:
task1 >> task2means the same as above, makingtask2downstream oftask1.
Both ways create a directed link showing which task runs first and which runs next.
python
task1.set_downstream(task2) task2.set_upstream(task1) # Or equivalently task1 >> task2 # means task1 runs before task2 task2 << task1 # means the same
Example
This example shows a simple DAG with three tasks where task1 runs first, then task2 and task3 run after task1 in parallel.
python
from airflow import DAG from airflow.operators.dummy import DummyOperator from datetime import datetime default_args = { 'start_date': datetime(2024, 1, 1), } dag = DAG('example_dependency_dag', default_args=default_args, schedule_interval='@daily') # Define tasks task1 = DummyOperator(task_id='task1', dag=dag) task2 = DummyOperator(task_id='task2', dag=dag) task3 = DummyOperator(task_id='task3', dag=dag) # Set dependencies task1 >> [task2, task3]
Output
No direct output; DAG structure sets task1 to run before task2 and task3 in parallel.
Common Pitfalls
Common mistakes when setting dependencies include:
- Forgetting to set dependencies, causing tasks to run in parallel unintentionally.
- Using
set_upstreamandset_downstreaminconsistently, which can confuse the order. - Trying to set dependencies outside the DAG context or after DAG parsing, which has no effect.
Always set dependencies inside the DAG definition and use consistent syntax.
python
from airflow import DAG from airflow.operators.dummy import DummyOperator from datetime import datetime default_args = {'start_date': datetime(2024, 1, 1)} dag = DAG('wrong_dependency_dag', default_args=default_args, schedule_interval='@daily') task1 = DummyOperator(task_id='task1', dag=dag) task2 = DummyOperator(task_id='task2', dag=dag) # Wrong: no dependency set, tasks run in parallel # Right: task1 >> task2
Quick Reference
| Syntax | Meaning |
|---|---|
| task1 >> task2 | task2 runs after task1 |
| task2 << task1 | task2 runs after task1 (same as above) |
| task1.set_downstream(task2) | task2 runs after task1 |
| task2.set_upstream(task1) | task2 runs after task1 |
| task1 >> [task2, task3] | task2 and task3 run after task1 in parallel |
Key Takeaways
Use >> or set_downstream() to set a task to run after another.
Set dependencies inside the DAG definition to ensure they take effect.
Use lists with >> to set multiple downstream tasks at once.
Avoid missing dependencies to prevent unintended parallel runs.
Both set_upstream() and set_downstream() methods work but be consistent.