Pushing and pulling XCom values in Apache Airflow - Time & Space Complexity
When working with Airflow, tasks often share small pieces of data called XComs.
We want to understand how the time to push and pull these values changes as the number of XComs grows.
Analyze the time complexity of pushing and pulling XCom values in Airflow tasks.
from airflow import DAG
from airflow.operators.python import PythonOperator
from airflow.utils.dates import days_ago
def push_xcom(ti):
for i in range(100):
ti.xcom_push(key=f"key_{i}", value=i)
def pull_xcom(ti):
values = []
for i in range(100):
values.append(ti.xcom_pull(key=f"key_{i}"))
with DAG(dag_id='xcom_example', start_date=days_ago(1), schedule_interval=None) as dag:
push = PythonOperator(task_id='push', python_callable=push_xcom)
pull = PythonOperator(task_id='pull', python_callable=pull_xcom)
push >> pull
This code pushes 100 XCom values and then pulls 100 values back in separate tasks.
Look at the loops that repeat operations:
- Primary operation: Looping 100 times to push or pull XCom values.
- How many times: Each loop runs once per task execution, repeating 100 times.
As the number of XCom values (n) increases, the number of push or pull operations grows directly with n.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 push or pull calls |
| 100 | 100 push or pull calls |
| 1000 | 1000 push or pull calls |
Pattern observation: The work grows in a straight line as you add more XCom values.
Time Complexity: O(n)
This means the time to push or pull XCom values grows directly in proportion to how many values you handle.
[X] Wrong: "Pushing or pulling many XComs happens instantly no matter how many there are."
[OK] Correct: Each push or pull is a separate operation, so more values mean more work and more time.
Understanding how data sharing scales in Airflow tasks shows you can reason about workflow efficiency and resource use.
"What if we changed the code to push all values in one single XCom instead of many? How would the time complexity change?"