0
0
AirflowHow-ToBeginner ยท 3 min read

How to Use chain() in Apache Airflow for Task Dependencies

In Apache Airflow, use the chain() function to set a sequence of task dependencies in a simple way by passing tasks in the order they should run. This replaces multiple set_downstream() or set_upstream() calls, making your DAG code cleaner and easier to read.
๐Ÿ“

Syntax

The chain() function takes multiple tasks or lists of tasks as arguments and links them in the order provided. Each task will be set as downstream of the previous one.

Basic syntax:

chain(task1, task2, task3, ...)

Where:

  • task1, task2, task3 are Airflow tasks or lists of tasks.
  • Tasks are linked so task1 runs before task2, which runs before task3, and so on.
python
from airflow.models.baseoperator import chain

chain(task1, task2, task3)
๐Ÿ’ป

Example

This example shows how to use chain() to link three tasks in a DAG so they run one after another.

python
from airflow import DAG
from airflow.operators.bash import BashOperator
from airflow.utils.dates import days_ago
from airflow.models.baseoperator import chain

default_args = {
    'start_date': days_ago(1),
}

dag = DAG('chain_example_dag', default_args=default_args, schedule_interval='@daily')

# Define tasks
start = BashOperator(task_id='start', bash_command='echo Start', dag=dag)
process = BashOperator(task_id='process', bash_command='echo Processing', dag=dag)
end = BashOperator(task_id='end', bash_command='echo End', dag=dag)

# Use chain to set dependencies
chain(start, process, end)
Output
When the DAG runs, tasks execute in order: start โ†’ process โ†’ end.
โš ๏ธ

Common Pitfalls

Common mistakes when using chain() include:

  • Passing tasks in the wrong order, which causes incorrect execution sequence.
  • Mixing single tasks and lists incorrectly, leading to unexpected dependencies.
  • Not importing chain from airflow.models.baseoperator.

Example of a wrong usage and the correct fix:

python
# Wrong: tasks order reversed
chain(end, process, start)  # This runs end before process and start, which is wrong

# Right: correct order
chain(start, process, end)
๐Ÿ“Š

Quick Reference

FunctionDescriptionExample
chainLinks tasks in sequencechain(task1, task2, task3)
set_downstreamSets one task downstream of anothertask1.set_downstream(task2)
set_upstreamSets one task upstream of anothertask2.set_upstream(task1)
โœ…

Key Takeaways

Use chain() to link multiple tasks in a clear, linear order.
Pass tasks to chain() in the exact order you want them to run.
Import chain from airflow.models.baseoperator before using it.
Avoid mixing lists and single tasks incorrectly in chain arguments.
chain() simplifies DAG code by replacing multiple set_downstream calls.