How to Pass Data Between Tasks in Airflow TaskFlow API
In Airflow TaskFlow API, you pass data between tasks by returning values from one task function and receiving them as parameters in downstream tasks using the
@task decorator. Airflow automatically uses XCom to transfer this data behind the scenes, making it simple and clean to share information between tasks.Syntax
Use the @task decorator to define tasks as Python functions. Return a value from an upstream task, and accept it as a parameter in a downstream task function. Airflow handles the data transfer using XCom automatically.
Key parts:
@task: Marks a function as an Airflow task.- Return value: Data to pass to the next task.
- Function parameter: Receives data from upstream task.
python
from airflow.decorators import dag, task from airflow.utils.dates import days_ago @dag(start_date=days_ago(1), schedule_interval=None) def example_dag(): @task def task_a(): return 'data from task A' @task def task_b(data): print(f'Received: {data}') data = task_a() task_b(data) example_dag_instance = example_dag()
Example
This example shows two tasks: task_a returns a string, and task_b receives it as input and prints it. The data passes automatically via Airflow's XCom system.
python
from airflow.decorators import dag, task from airflow.utils.dates import days_ago @dag(start_date=days_ago(1), schedule_interval=None, catchup=False) def pass_data_dag(): @task def task_a(): return {'key': 'value'} @task def task_b(data): print(f'Received data: {data}') data = task_a() task_b(data) pass_data_dag_instance = pass_data_dag()
Output
Received data: {'key': 'value'}
Common Pitfalls
Common mistakes when passing data between tasks include:
- Not returning a value from the upstream task, so downstream tasks get
None. - Trying to pass complex or large data directly, which can cause
XComsize limits or serialization errors. - Forgetting to call the upstream task function to get the
XComreference instead of the function itself.
Always return simple serializable data (like strings, dicts, lists) and call the task function to pass its output.
python
from airflow.decorators import dag, task from airflow.utils.dates import days_ago @dag(start_date=days_ago(1), schedule_interval=None) def wrong_example(): @task def task_a(): print('No return here') @task def task_b(data): print(f'Received: {data}') # Wrong: passing function, not output task_b(task_a) @dag(start_date=days_ago(1), schedule_interval=None) def right_example(): @task def task_a(): return 'correct data' @task def task_b(data): print(f'Received: {data}') data = task_a() task_b(data) wrong_example_instance = wrong_example() right_example_instance = right_example()
Quick Reference
| Concept | Description | Example |
|---|---|---|
| @task decorator | Marks a Python function as an Airflow task | @task def my_task(): pass |
| Return value | Data returned is passed to downstream tasks via XCom | def task_a(): return 'data' |
| Passing data | Call upstream task function and pass its output to downstream task | data = task_a() task_b(data) |
| XCom | Airflow's internal system to transfer data between tasks | Handled automatically in TaskFlow API |
| Data type | Use simple serializable types (str, dict, list) | return {'key': 'value'} |
Key Takeaways
Use @task decorator and return values to pass data between tasks in Airflow TaskFlow API.
Airflow uses XCom behind the scenes to transfer data automatically.
Always call the upstream task function to get its output, not the function itself.
Return simple serializable data to avoid XCom errors.
Avoid passing large or complex objects directly between tasks.