0
0
Apache Airflowdevops~10 mins

Sharing data between tasks effectively in Apache Airflow - Step-by-Step Execution

Choose your learning style9 modes available
Process Flow - Sharing data between tasks effectively
Task 1 runs
Push data to XCom
Task 2 waits
Task 2 pulls data from XCom
Task 2 uses data
Workflow continues
Task 1 produces data and pushes it to XCom. Task 2 waits, then pulls that data from XCom to use it.
Execution Sample
Apache Airflow
from airflow import DAG
from airflow.operators.python import PythonOperator

def push_data(ti):
    ti.xcom_push(key='my_key', value='hello')

def pull_data(ti):
    data = ti.xcom_pull(key='my_key', task_ids='push_data')
    print(f"Pulled data: {data}")
Task push_data sends 'hello' to XCom; pull_data retrieves and prints it.
Process Table
StepTaskActionXCom StateOutput
1push_dataRuns and pushes 'hello' with key 'my_key'{'push_data': {'my_key': 'hello'}}No output
2pull_dataPulls 'hello' from XCom using key 'my_key'{'push_data': {'my_key': 'hello'}}Pulled data: hello
3WorkflowContinues to next tasks{'push_data': {'my_key': 'hello'}}No output
💡 All tasks completed; data shared successfully via XCom.
Status Tracker
VariableStartAfter push_dataAfter pull_dataFinal
XCom{}{'push_data': {'my_key': 'hello'}}{'push_data': {'my_key': 'hello'}}{'push_data': {'my_key': 'hello'}}
data in pull_dataNoneNone'hello''hello'
Key Moments - 2 Insights
Why does pull_data need to specify task_ids='push_task' when pulling data?
Because XCom stores data per task, pull_data must specify which task's data to retrieve, as shown in execution_table step 2.
Can tasks share data without using XCom?
Not effectively within Airflow's task context; XCom is the built-in way to share small data between tasks, as demonstrated in the flow.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table step 1, what key-value pair is pushed to XCom?
A'my_key': 'world'
B'data': 'hello'
C'my_key': 'hello'
D'key': 'hello'
💡 Hint
Check the 'XCom State' column in step 1 of the execution_table.
At which step does pull_data retrieve the data from XCom?
AStep 1
BStep 2
CStep 3
DStep 4
💡 Hint
Look at the 'Action' column describing pull_data's behavior.
If push_data pushed a different key, what must pull_data change to get the data?
AChange the key parameter in xcom_pull
BChange the task_id parameter
CChange the DAG id
DNo change needed
💡 Hint
Refer to the 'key' argument in the xcom_pull call in the execution_sample code.
Concept Snapshot
Airflow tasks share data using XCom.
Task 1 pushes data with ti.xcom_push(key, value).
Task 2 pulls data with ti.xcom_pull(key, task_ids).
XCom stores data per task instance.
Use keys to identify data.
This enables effective data sharing between tasks.
Full Transcript
In Airflow, tasks share data using a feature called XCom. First, one task runs and pushes data to XCom with a key. Then, another task waits and pulls that data from XCom using the same key and the task id of the pushing task. This way, tasks can pass small pieces of data between each other safely and clearly. The execution table shows step-by-step how push_data stores 'hello' in XCom and pull_data retrieves it. Variables like XCom and data in pull_data change accordingly. Remember, when pulling data, you must specify the task id that pushed it and the key used. This method is the standard way to share data effectively between Airflow tasks.