0
0
AirflowHow-ToBeginner · 4 min read

How to Use set_upstream in Airflow for Task Dependencies

In Airflow, set_upstream is a method used to set a task as a dependency that must run before the current task. You call task2.set_upstream(task1) to ensure task1 runs before task2. This helps control the order of task execution in your DAG.
📐

Syntax

The set_upstream method is called on a task object to specify which task should run before it. The syntax is:

  • task.set_upstream(other_task): This means other_task must complete before task starts.

Here, task and other_task are Airflow task instances.

python
task2.set_upstream(task1)
💻

Example

This example shows two tasks where task1 runs before task2 using set_upstream. It demonstrates how to set dependencies in a DAG.

python
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime

default_args = {
    'start_date': datetime(2024, 1, 1),
}

dag = DAG('example_set_upstream', default_args=default_args, schedule_interval='@daily')

task1 = BashOperator(
    task_id='task1',
    bash_command='echo "Task 1 running"',
    dag=dag
)

task2 = BashOperator(
    task_id='task2',
    bash_command='echo "Task 2 running"',
    dag=dag
)

# Set task1 to run before task2
task2.set_upstream(task1)
Output
When the DAG runs, Airflow executes task1 first, then task2 after task1 completes successfully.
⚠️

Common Pitfalls

Common mistakes when using set_upstream include:

  • Confusing set_upstream with set_downstream. set_upstream means the argument runs before the caller, while set_downstream means the argument runs after.
  • Not setting dependencies properly, which can cause tasks to run in the wrong order or in parallel unexpectedly.
  • Using set_upstream on tasks from different DAGs, which is not allowed.

Example of wrong and right usage:

python
# Wrong: task1.set_upstream(task2) means task2 runs before task1
# Right: task2.set_upstream(task1) means task1 runs before task2
📊

Quick Reference

Use set_upstream to define that one task must finish before another starts. It is equivalent to task2.set_upstream(task1) or task1.set_downstream(task2).

Remember:

  • task2.set_upstream(task1): task1 runs before task2
  • task1.set_downstream(task2): task1 runs before task2 (same as above)
MethodMeaning
task2.set_upstream(task1)task1 runs before task2
task1.set_downstream(task2)task1 runs before task2
task1.set_upstream(task2)task2 runs before task1 (usually a mistake)
task2.set_downstream(task1)task2 runs before task1 (usually a mistake)

Key Takeaways

Use set_upstream to make one task run before another in Airflow DAGs.
Calling task2.set_upstream(task1) means task1 runs before task2.
set_upstream and set_downstream are two ways to set task order; use them carefully to avoid confusion.
Do not set dependencies between tasks in different DAGs.
Proper task dependencies ensure your workflow runs in the correct order.