0
0
Apache Airflowdevops~10 mins

XCom size limitations and alternatives in Apache Airflow - Step-by-Step Execution

Choose your learning style9 modes available
Process Flow - XCom size limitations and alternatives
Task pushes XCom
Check XCom size
Store
Task pulls XCom
Process Data
This flow shows how Airflow tasks push data to XCom, check size limits, and either store it or use alternatives before pulling and processing.
Execution Sample
Apache Airflow
from airflow.models import XCom

def push_large_data(ti):
    data = 'x' * 1200000  # 1.2MB string
    ti.xcom_push(key='large_data', value=data)
This code tries to push a large string (1.2MB) to XCom, which exceeds the default size limit.
Process Table
StepActionData Size (bytes)ResultNotes
1Task tries to push 1.2MB data to XCom1200000FailExceeds default 48KB limit
2Fallback: Store data in external storage (e.g., S3)1200000SuccessData stored externally
3Push XCom with pointer (e.g., S3 path)100SuccessSmall pointer stored in XCom
4Task pulls XCom pointer100SuccessPointer retrieved
5Task fetches large data from external storage1200000SuccessData retrieved for processing
6Process data1200000SuccessData processed
7End--Execution complete
💡 XCom push fails due to size limit; alternative external storage used with pointer in XCom
Status Tracker
VariableStartAfter Step 1After Step 2After Step 3After Step 4After Step 5Final
data'''x'*1200000'x'*1200000 stored externallyS3 path stringS3 path string'x'*1200000 fetched'x'*1200000 processed
XCom_valueNoneFail to store large dataNoneS3 path string storedS3 path string retrievedS3 path stringS3 path string
Key Moments - 3 Insights
Why does pushing large data directly to XCom fail?
Because XCom has a default size limit (~48KB). Step 1 in the execution_table shows the push fails due to exceeding this limit.
How can large data be passed between tasks if XCom size is limited?
By storing the large data externally (like in S3) and pushing only a small pointer or path in XCom, as shown in steps 2 and 3.
What does the task do to retrieve the large data after getting the pointer from XCom?
It uses the pointer from XCom to fetch the large data from external storage before processing, as shown in steps 4 and 5.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, at which step does the XCom push fail due to size?
AStep 1
BStep 2
CStep 3
DStep 4
💡 Hint
Check the 'Result' column for failure in step 1.
According to variable_tracker, what is stored in XCom after step 3?
AThe full 1.2MB data string
BAn empty string
CA small pointer string (e.g., S3 path)
DNone
💡 Hint
Look at the 'XCom_value' row after step 3 in variable_tracker.
If the data size was under 48KB, how would the execution_table change?
AStep 1 would fail and steps 2 and 3 would run
BStep 1 would succeed and steps 2 and 3 would be skipped
CStep 5 would fail
DStep 4 would fail
💡 Hint
Refer to the exit_note and steps showing failure due to size.
Concept Snapshot
XCom stores small data between Airflow tasks.
Default size limit is ~48KB.
Large data push fails and needs alternatives.
Use external storage (S3, DB) for big data.
Push a small pointer in XCom instead.
Tasks pull pointer, then fetch large data externally.
Full Transcript
In Airflow, tasks communicate using XCom, which has a size limit of about 48KB. If a task tries to push data larger than this limit, the push fails. To handle large data, the task can store the data externally, such as in S3, and push only a small pointer or path to that data in XCom. The downstream task then pulls this pointer from XCom and fetches the large data from the external storage before processing it. This approach avoids XCom size limitations and ensures smooth data passing between tasks.