0
0
AirflowHow-ToBeginner · 3 min read

Limitations of XCom in Airflow: What You Need to Know

In Apache Airflow, XCom is limited by a small size limit (usually 48KB) for data it can pass between tasks, and it requires data to be serializable (picklable). Large or complex data should be stored externally, as XCom is not designed for heavy data transfer or long-term storage.
📐

Syntax

The XCom feature in Airflow allows tasks to share small pieces of data using the xcom_push() and xcom_pull() methods.

  • task_instance.xcom_push(key, value): Sends data with a key.
  • task_instance.xcom_pull(task_ids, key): Retrieves data by key from a specific task.

This data is stored in Airflow's metadata database and must be serializable.

python
def push_function(ti):
    ti.xcom_push(key='sample_key', value='small_data')

def pull_function(ti):
    data = ti.xcom_pull(task_ids='push_task', key='sample_key')
    print(f"Pulled data: {data}")
💻

Example

This example shows how to push and pull a small string using XCom in Airflow tasks.

python
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime

def push_function(ti):
    ti.xcom_push(key='message', value='Hello from XCom!')

def pull_function(ti):
    message = ti.xcom_pull(task_ids='push_task', key='message')
    print(f"Pulled message: {message}")

with DAG('xcom_limitations_example', start_date=datetime(2024, 1, 1), schedule_interval=None, catchup=False) as dag:
    push_task = PythonOperator(
        task_id='push_task',
        python_callable=push_function
    )
    pull_task = PythonOperator(
        task_id='pull_task',
        python_callable=pull_function
    )

    push_task >> pull_task
Output
Pulled message: Hello from XCom!
⚠️

Common Pitfalls

1. Size Limit: XCom data is limited to about 48KB by default because it is stored in the Airflow metadata database. Trying to push large data causes errors or truncation.

2. Serialization Issues: Data must be serializable with pickle. Complex objects or open file handles cannot be passed.

3. Not for Long-Term Storage: XCom is meant for short-lived data sharing between tasks, not for persistent storage.

4. Performance Impact: Excessive use of XCom with large data can slow down the scheduler and database.

python
def wrong_push(ti):
    large_data = 'x' * 100000  # 100KB string, too large
    ti.xcom_push(key='large', value=large_data)  # This will fail or truncate

def right_push(ti):
    # Instead, store large data externally and push a reference
    ti.xcom_push(key='file_path', value='/tmp/large_data.txt')
📊

Quick Reference

XCom Limitations Summary:

  • Max size ~48KB per XCom entry.
  • Data must be picklable (serializable).
  • Not suitable for large or binary data.
  • Use external storage (S3, DB, files) for big data.
  • Use XCom for small metadata or signals only.

Key Takeaways

XCom is limited to small, serializable data (about 48KB max).
Avoid passing large or complex objects via XCom to prevent errors.
Use external storage for large data and pass references through XCom.
XCom is designed for short-term task communication, not long-term storage.
Excessive or large XCom usage can degrade Airflow performance.