0
0
Apache Airflowdevops~7 mins

XCom size limitations and alternatives in Apache Airflow - Commands & Configuration

Choose your learning style9 modes available
Introduction
Airflow uses XComs to share small pieces of data between tasks. However, XComs have size limits because they store data in the Airflow database. This can cause problems when you try to pass large data. Knowing how to handle this helps keep your workflows smooth.
When you want to pass a small result like a file name or status message from one task to another.
When you try to share large data like big files or big JSON objects and get errors or slow performance.
When you want to keep your Airflow database fast and avoid it filling up with large data.
When you want to use external storage to share big data between tasks.
When you want to improve reliability by avoiding database timeouts caused by large XComs.
Commands
This command pulls the XCom value from the task 'example_task' in the DAG 'example_dag'. It shows the data passed between tasks.
Terminal
airflow tasks xcom_pull -d example_dag -t example_task
Expected OutputExpected
"small_result_value"
-d - Specifies the DAG id to pull XCom from
-t - Specifies the task id to pull XCom from
Runs the task 'example_task' locally to generate and push an XCom value. Useful to test XCom behavior.
Terminal
airflow tasks run example_dag example_task 2024-06-01T00:00:00 --local
Expected OutputExpected
[2024-06-01 00:00:00,000] {taskinstance.py:1234} INFO - Task succeeded
--local - Runs the task without scheduler for testing
Uploads a large data file to external storage (S3) to avoid storing it in XCom.
Terminal
aws s3 cp large_data.json s3://my-airflow-bucket/large_data.json
Expected OutputExpected
upload: ./large_data.json to s3://my-airflow-bucket/large_data.json
Pushes the S3 path as a small XCom value instead of the large data itself. Tasks can then read the file from S3.
Terminal
airflow tasks xcom_push --key large_data_path --value s3://my-airflow-bucket/large_data.json -d example_dag -t example_task --execution-date 2024-06-01T00:00:00
Expected OutputExpected
No output (command runs silently)
--key - Sets the XCom key name
--value - Sets the XCom value
-d - Specifies the DAG id
-t - Specifies the task id
--execution-date - Specifies the execution date of the task instance
Key Concept

If you remember nothing else, remember: XComs are for small data only; use external storage for large data and pass references via XCom.

Common Mistakes
Trying to push large files or big JSON objects directly into XCom.
This causes Airflow database to slow down or fail because XCom size is limited.
Store large data in external storage like S3 or a database, then push only the file path or reference as XCom.
Ignoring XCom size limits and expecting all data to pass smoothly.
Large data can cause task failures or long delays in Airflow UI and scheduler.
Design tasks to share only small metadata or pointers via XCom.
Summary
XComs in Airflow are designed to pass small pieces of data between tasks.
Large data should be stored outside Airflow, such as in cloud storage, and only references passed via XCom.
Use commands to push and pull XCom values carefully to avoid size limit issues.