0
0
Apache Airflowdevops~10 mins

Mapped tasks for parallel processing in Apache Airflow - Step-by-Step Execution

Choose your learning style9 modes available
Process Flow - Mapped tasks for parallel processing
Define base task
Create list of inputs
Map base task over inputs
Run tasks in parallel
Collect results
Complete DAG
This flow shows how a single task is defined and then mapped over multiple inputs to run many tasks in parallel, speeding up processing.
Execution Sample
Apache Airflow
from airflow import DAG
from airflow.decorators import task
from datetime import datetime

with DAG('mapped_tasks_dag', start_date=datetime(2024,1,1)) as dag:
    @task
    def process_item(item):
        return item * 2

    items = [1, 2, 3]
    results = process_item.expand(item=items)
This Airflow DAG defines a task that doubles a number, then runs that task in parallel for each number in the list.
Process Table
StepActionInputTask InstanceOutputNotes
1Define task functionNoneprocess_itemFunction readyTask process_item is defined
2Create input listNoneN/A[1, 2, 3]List of items to process
3Map task over inputs[1, 2, 3]process_item[1]ScheduledTask instance for item=1 created
4Map task over inputs[1, 2, 3]process_item[2]ScheduledTask instance for item=2 created
5Map task over inputs[1, 2, 3]process_item[3]ScheduledTask instance for item=3 created
6Run task instance1process_item[1]2Item 1 doubled to 2
7Run task instance2process_item[2]4Item 2 doubled to 4
8Run task instance3process_item[3]6Item 3 doubled to 6
9Collect resultsN/AN/A[2, 4, 6]All mapped tasks completed
10DAG completeN/AN/ASuccessAll tasks finished successfully
💡 All mapped task instances finished, DAG run completes successfully
Status Tracker
VariableStartAfter Step 2After Step 3-5After Step 6-8Final
itemsNone[1, 2, 3][1, 2, 3][1, 2, 3][1, 2, 3]
resultsNoneNoneNone[2, 4, 6][2, 4, 6]
Key Moments - 3 Insights
Why do we see multiple task instances with the same task name but different inputs?
Because the task is mapped over a list of inputs, Airflow creates separate task instances for each input to run them in parallel, as shown in steps 3 to 5.
How does Airflow know when all mapped tasks are done?
Airflow tracks each task instance separately and considers the mapped task complete only when all instances finish successfully, as seen in step 9.
What happens if one mapped task instance fails?
The DAG run will be marked as failed or retry depending on configuration, because all mapped tasks must succeed for the DAG to complete, implied by the exit note.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is the output of process_item[2] at step 7?
A6
B2
C4
DNone
💡 Hint
Check row with Step 7 in execution_table under Output column
At which step are all mapped task instances scheduled but not yet run?
AStep 5
BStep 2
CStep 6
DStep 9
💡 Hint
Look at steps 3 to 5 in execution_table where tasks are scheduled
If the input list changed to [1, 2, 3, 4], how would the execution table change?
AThe DAG would complete earlier
BThere would be an extra mapped task instance for item=4
CThe output of process_item[3] would be 8
DNo change in task instances
💡 Hint
Mapping creates one task instance per input item, see steps 3-5
Concept Snapshot
Mapped tasks let you run the same task code on many inputs at once.
Define a task function, create a list of inputs, then use expand() to map.
Airflow runs each mapped task instance in parallel.
All instances must finish for the DAG to succeed.
This speeds up processing by using parallelism.
Full Transcript
Mapped tasks in Airflow allow running the same task multiple times with different inputs in parallel. First, you define a base task function. Then, you create a list of inputs. Using the expand() method, Airflow creates separate task instances for each input. These run at the same time, speeding up the workflow. The DAG completes only after all mapped tasks finish successfully. This visual trace showed each step from defining the task, scheduling mapped instances, running them, and collecting results.