How to Pass Data Between Tasks in Airflow: Simple Guide
In Airflow, you pass data between tasks using
XComs, which are small messages stored in the Airflow metadata database. Tasks can push data with task_instance.xcom_push() and pull it with task_instance.xcom_pull() to share information during a workflow.Syntax
Airflow uses XCom (short for cross-communication) to pass data between tasks. The main methods are:
task_instance.xcom_push(key, value): Sends data from one task.task_instance.xcom_pull(task_ids, key): Retrieves data in another task.
Parameters:
key: Identifier for the data.value: The data to share (must be serializable).task_ids: The task ID or list of task IDs to pull data from.
python
def push_function(ti): ti.xcom_push(key='sample_key', value='Hello from task 1') def pull_function(ti): message = ti.xcom_pull(task_ids='push_task', key='sample_key') print(f"Received message: {message}")
Example
This example shows two PythonOperator tasks: one pushes a message, the other pulls and prints it.
python
from airflow import DAG from airflow.operators.python import PythonOperator from datetime import datetime def push_function(ti): ti.xcom_push(key='sample_key', value='Hello from task 1') def pull_function(ti): message = ti.xcom_pull(task_ids='push_task', key='sample_key') print(f"Received message: {message}") with DAG(dag_id='xcom_example', start_date=datetime(2024, 1, 1), schedule_interval=None, catchup=False) as dag: push_task = PythonOperator( task_id='push_task', python_callable=push_function ) pull_task = PythonOperator( task_id='pull_task', python_callable=pull_function ) push_task >> pull_task
Output
Received message: Hello from task 1
Common Pitfalls
Common mistakes when passing data between tasks include:
- Trying to pass large data objects; XComs are meant for small messages.
- Not specifying
keyortask_idscorrectly when pulling data. - Assuming data is available before the upstream task finishes.
- Forgetting that XCom data must be serializable (e.g., JSON serializable).
Always ensure task dependencies are set so the data producer runs before the consumer.
python
def wrong_pull(ti): # Missing task_ids causes None or wrong data message = ti.xcom_pull(key='sample_key') # Wrong: no task_ids print(f"Message: {message}") # Correct way: # message = ti.xcom_pull(task_ids='push_task', key='sample_key')
Quick Reference
| Method | Purpose | Parameters |
|---|---|---|
| xcom_push | Send data from a task | key (str), value (serializable) |
| xcom_pull | Retrieve data in a task | task_ids (str or list), key (str) |
| set_upstream / set_downstream | Define task order | other task object |
Key Takeaways
Use XComs to pass small, serializable data between Airflow tasks.
Push data with task_instance.xcom_push and pull with task_instance.xcom_pull specifying task_ids and key.
Ensure task dependencies so the data producer runs before the consumer.
Avoid passing large data through XComs; use external storage for big data.
Always specify keys and task IDs correctly to avoid pulling wrong or no data.