0
0
AirflowHow-ToBeginner · 3 min read

How to Pass Data Between Tasks in Airflow TaskFlow API

In Airflow TaskFlow API, you pass data between tasks by returning values from one task function and receiving them as parameters in downstream tasks using the @task decorator. Airflow automatically uses XCom to transfer this data behind the scenes, making it simple and clean to share information between tasks.
📐

Syntax

Use the @task decorator to define tasks as Python functions. Return a value from an upstream task, and accept it as a parameter in a downstream task function. Airflow handles the data transfer using XCom automatically.

Key parts:

  • @task: Marks a function as an Airflow task.
  • Return value: Data to pass to the next task.
  • Function parameter: Receives data from upstream task.
python
from airflow.decorators import dag, task
from airflow.utils.dates import days_ago

@dag(start_date=days_ago(1), schedule_interval=None)
def example_dag():
    @task
    def task_a():
        return 'data from task A'

    @task
    def task_b(data):
        print(f'Received: {data}')

    data = task_a()
    task_b(data)

example_dag_instance = example_dag()
💻

Example

This example shows two tasks: task_a returns a string, and task_b receives it as input and prints it. The data passes automatically via Airflow's XCom system.

python
from airflow.decorators import dag, task
from airflow.utils.dates import days_ago

@dag(start_date=days_ago(1), schedule_interval=None, catchup=False)
def pass_data_dag():
    @task
    def task_a():
        return {'key': 'value'}

    @task
    def task_b(data):
        print(f'Received data: {data}')

    data = task_a()
    task_b(data)

pass_data_dag_instance = pass_data_dag()
Output
Received data: {'key': 'value'}
⚠️

Common Pitfalls

Common mistakes when passing data between tasks include:

  • Not returning a value from the upstream task, so downstream tasks get None.
  • Trying to pass complex or large data directly, which can cause XCom size limits or serialization errors.
  • Forgetting to call the upstream task function to get the XCom reference instead of the function itself.

Always return simple serializable data (like strings, dicts, lists) and call the task function to pass its output.

python
from airflow.decorators import dag, task
from airflow.utils.dates import days_ago

@dag(start_date=days_ago(1), schedule_interval=None)
def wrong_example():
    @task
    def task_a():
        print('No return here')

    @task
    def task_b(data):
        print(f'Received: {data}')

    # Wrong: passing function, not output
    task_b(task_a)

@dag(start_date=days_ago(1), schedule_interval=None)
def right_example():
    @task
    def task_a():
        return 'correct data'

    @task
    def task_b(data):
        print(f'Received: {data}')

    data = task_a()
    task_b(data)

wrong_example_instance = wrong_example()
right_example_instance = right_example()
📊

Quick Reference

ConceptDescriptionExample
@task decoratorMarks a Python function as an Airflow task@task def my_task(): pass
Return valueData returned is passed to downstream tasks via XComdef task_a(): return 'data'
Passing dataCall upstream task function and pass its output to downstream taskdata = task_a() task_b(data)
XComAirflow's internal system to transfer data between tasksHandled automatically in TaskFlow API
Data typeUse simple serializable types (str, dict, list)return {'key': 'value'}

Key Takeaways

Use @task decorator and return values to pass data between tasks in Airflow TaskFlow API.
Airflow uses XCom behind the scenes to transfer data automatically.
Always call the upstream task function to get its output, not the function itself.
Return simple serializable data to avoid XCom errors.
Avoid passing large or complex objects directly between tasks.