What is XCom in Airflow: Simple Explanation and Usage
XCom in Airflow stands for "cross-communication" and is a way for tasks to share small pieces of data during a workflow run. It allows one task to push data that another task can pull later, enabling communication between tasks in a Directed Acyclic Graph (DAG).How It Works
Think of XCom as a message board inside Airflow where tasks can leave notes for each other. When a task finishes some work and wants to share a small result, it "pushes" this data to the board. Later, another task can "pull" this data from the board to use it.
This helps tasks communicate without directly connecting to each other, keeping the workflow flexible and organized. The data shared is usually small, like strings, numbers, or simple objects, not large files.
Example
This example shows two tasks: one pushes a message to XCom, and the other pulls it and prints it.
from airflow import DAG from airflow.operators.python import PythonOperator from datetime import datetime def push_function(ti): ti.xcom_push(key='message', value='Hello from task 1') def pull_function(ti): message = ti.xcom_pull(key='message', task_ids='push_task') print(f'Received message: {message}') default_args = { 'start_date': datetime(2024, 1, 1), } dag = DAG('xcom_example', default_args=default_args, schedule_interval=None) push_task = PythonOperator( task_id='push_task', python_callable=push_function, dag=dag ) pull_task = PythonOperator( task_id='pull_task', python_callable=pull_function, dag=dag ) push_task >> pull_task
When to Use
Use XCom when you need to pass small data between tasks in the same workflow run. For example, if one task downloads data and another task processes it, the first task can push the file path or a summary to XCom for the second task to use.
It is not meant for large data or files; for that, use external storage like databases or cloud storage. XCom is great for sharing simple flags, IDs, or small results that help tasks coordinate.
Key Points
- XCom stands for cross-communication between tasks.
- It allows pushing and pulling small data pieces during a DAG run.
- Data is stored in Airflow's metadata database.
- Not suitable for large files or heavy data.
- Helps tasks coordinate and share results easily.