0
0
AirflowHow-ToBeginner · 3 min read

How to Set Task Dependencies in Airflow: Simple Guide

In Airflow, you set task dependencies using the set_upstream() and set_downstream() methods or the bitshift operators >> (downstream) and << (upstream). These methods define the order tasks run by linking them in your DAG code.
📐

Syntax

Airflow allows you to set task dependencies in two main ways:

  • Using methods: task1.set_downstream(task2) means task2 runs after task1.
  • Using operators: task1 >> task2 means the same as above, making task2 downstream of task1.

Both ways create a directed link showing which task runs first and which runs next.

python
task1.set_downstream(task2)
task2.set_upstream(task1)

# Or equivalently

task1 >> task2
# means task1 runs before task2

task2 << task1
# means the same
💻

Example

This example shows a simple DAG with three tasks where task1 runs first, then task2 and task3 run after task1 in parallel.

python
from airflow import DAG
from airflow.operators.dummy import DummyOperator
from datetime import datetime

default_args = {
    'start_date': datetime(2024, 1, 1),
}

dag = DAG('example_dependency_dag', default_args=default_args, schedule_interval='@daily')

# Define tasks
task1 = DummyOperator(task_id='task1', dag=dag)
task2 = DummyOperator(task_id='task2', dag=dag)
task3 = DummyOperator(task_id='task3', dag=dag)

# Set dependencies
task1 >> [task2, task3]
Output
No direct output; DAG structure sets task1 to run before task2 and task3 in parallel.
⚠️

Common Pitfalls

Common mistakes when setting dependencies include:

  • Forgetting to set dependencies, causing tasks to run in parallel unintentionally.
  • Using set_upstream and set_downstream inconsistently, which can confuse the order.
  • Trying to set dependencies outside the DAG context or after DAG parsing, which has no effect.

Always set dependencies inside the DAG definition and use consistent syntax.

python
from airflow import DAG
from airflow.operators.dummy import DummyOperator
from datetime import datetime

default_args = {'start_date': datetime(2024, 1, 1)}

dag = DAG('wrong_dependency_dag', default_args=default_args, schedule_interval='@daily')

task1 = DummyOperator(task_id='task1', dag=dag)
task2 = DummyOperator(task_id='task2', dag=dag)

# Wrong: no dependency set, tasks run in parallel

# Right:
task1 >> task2
📊

Quick Reference

SyntaxMeaning
task1 >> task2task2 runs after task1
task2 << task1task2 runs after task1 (same as above)
task1.set_downstream(task2)task2 runs after task1
task2.set_upstream(task1)task2 runs after task1
task1 >> [task2, task3]task2 and task3 run after task1 in parallel

Key Takeaways

Use >> or set_downstream() to set a task to run after another.
Set dependencies inside the DAG definition to ensure they take effect.
Use lists with >> to set multiple downstream tasks at once.
Avoid missing dependencies to prevent unintended parallel runs.
Both set_upstream() and set_downstream() methods work but be consistent.