How to Use set_upstream in Airflow for Task Dependencies
In Airflow,
set_upstream is a method used to set a task as a dependency that must run before the current task. You call task2.set_upstream(task1) to ensure task1 runs before task2. This helps control the order of task execution in your DAG.Syntax
The set_upstream method is called on a task object to specify which task should run before it. The syntax is:
task.set_upstream(other_task): This meansother_taskmust complete beforetaskstarts.
Here, task and other_task are Airflow task instances.
python
task2.set_upstream(task1)
Example
This example shows two tasks where task1 runs before task2 using set_upstream. It demonstrates how to set dependencies in a DAG.
python
from airflow import DAG from airflow.operators.bash import BashOperator from datetime import datetime default_args = { 'start_date': datetime(2024, 1, 1), } dag = DAG('example_set_upstream', default_args=default_args, schedule_interval='@daily') task1 = BashOperator( task_id='task1', bash_command='echo "Task 1 running"', dag=dag ) task2 = BashOperator( task_id='task2', bash_command='echo "Task 2 running"', dag=dag ) # Set task1 to run before task2 task2.set_upstream(task1)
Output
When the DAG runs, Airflow executes task1 first, then task2 after task1 completes successfully.
Common Pitfalls
Common mistakes when using set_upstream include:
- Confusing
set_upstreamwithset_downstream.set_upstreammeans the argument runs before the caller, whileset_downstreammeans the argument runs after. - Not setting dependencies properly, which can cause tasks to run in the wrong order or in parallel unexpectedly.
- Using
set_upstreamon tasks from different DAGs, which is not allowed.
Example of wrong and right usage:
python
# Wrong: task1.set_upstream(task2) means task2 runs before task1 # Right: task2.set_upstream(task1) means task1 runs before task2
Quick Reference
Use set_upstream to define that one task must finish before another starts. It is equivalent to task2.set_upstream(task1) or task1.set_downstream(task2).
Remember:
- task2.set_upstream(task1): task1 runs before task2
- task1.set_downstream(task2): task1 runs before task2 (same as above)
| Method | Meaning |
|---|---|
| task2.set_upstream(task1) | task1 runs before task2 |
| task1.set_downstream(task2) | task1 runs before task2 |
| task1.set_upstream(task2) | task2 runs before task1 (usually a mistake) |
| task2.set_downstream(task1) | task2 runs before task1 (usually a mistake) |
Key Takeaways
Use set_upstream to make one task run before another in Airflow DAGs.
Calling task2.set_upstream(task1) means task1 runs before task2.
set_upstream and set_downstream are two ways to set task order; use them carefully to avoid confusion.
Do not set dependencies between tasks in different DAGs.
Proper task dependencies ensure your workflow runs in the correct order.