0
0
AirflowHow-ToBeginner · 4 min read

How to Pass Data Between Tasks in Airflow: Simple Guide

In Airflow, you pass data between tasks using XComs, which are small messages stored in the Airflow metadata database. Tasks can push data with task_instance.xcom_push() and pull it with task_instance.xcom_pull() to share information during a workflow.
📐

Syntax

Airflow uses XCom (short for cross-communication) to pass data between tasks. The main methods are:

  • task_instance.xcom_push(key, value): Sends data from one task.
  • task_instance.xcom_pull(task_ids, key): Retrieves data in another task.

Parameters:

  • key: Identifier for the data.
  • value: The data to share (must be serializable).
  • task_ids: The task ID or list of task IDs to pull data from.
python
def push_function(ti):
    ti.xcom_push(key='sample_key', value='Hello from task 1')

def pull_function(ti):
    message = ti.xcom_pull(task_ids='push_task', key='sample_key')
    print(f"Received message: {message}")
💻

Example

This example shows two PythonOperator tasks: one pushes a message, the other pulls and prints it.

python
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime

def push_function(ti):
    ti.xcom_push(key='sample_key', value='Hello from task 1')

def pull_function(ti):
    message = ti.xcom_pull(task_ids='push_task', key='sample_key')
    print(f"Received message: {message}")

with DAG(dag_id='xcom_example', start_date=datetime(2024, 1, 1), schedule_interval=None, catchup=False) as dag:
    push_task = PythonOperator(
        task_id='push_task',
        python_callable=push_function
    )

    pull_task = PythonOperator(
        task_id='pull_task',
        python_callable=pull_function
    )

    push_task >> pull_task
Output
Received message: Hello from task 1
⚠️

Common Pitfalls

Common mistakes when passing data between tasks include:

  • Trying to pass large data objects; XComs are meant for small messages.
  • Not specifying key or task_ids correctly when pulling data.
  • Assuming data is available before the upstream task finishes.
  • Forgetting that XCom data must be serializable (e.g., JSON serializable).

Always ensure task dependencies are set so the data producer runs before the consumer.

python
def wrong_pull(ti):
    # Missing task_ids causes None or wrong data
    message = ti.xcom_pull(key='sample_key')  # Wrong: no task_ids
    print(f"Message: {message}")

# Correct way:
# message = ti.xcom_pull(task_ids='push_task', key='sample_key')
📊

Quick Reference

MethodPurposeParameters
xcom_pushSend data from a taskkey (str), value (serializable)
xcom_pullRetrieve data in a tasktask_ids (str or list), key (str)
set_upstream / set_downstreamDefine task orderother task object

Key Takeaways

Use XComs to pass small, serializable data between Airflow tasks.
Push data with task_instance.xcom_push and pull with task_instance.xcom_pull specifying task_ids and key.
Ensure task dependencies so the data producer runs before the consumer.
Avoid passing large data through XComs; use external storage for big data.
Always specify keys and task IDs correctly to avoid pulling wrong or no data.