Apache Airflowdevops~10 mins

Mapped tasks for parallel processing in Apache Airflow - Step-by-Step Execution

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Process Flow - Mapped tasks for parallel processing

Define base task

↓

Create list of inputs

↓

Map base task over inputs

↓

Run tasks in parallel

↓

Collect results

↓

Complete DAG

This flow shows how a single task is defined and then mapped over multiple inputs to run many tasks in parallel, speeding up processing.

Execution Sample

Apache Airflow

from airflow import DAG
from airflow.decorators import task
from datetime import datetime

with DAG('mapped_tasks_dag', start_date=datetime(2024,1,1)) as dag:
    @task
    def process_item(item):
        return item * 2

    items = [1, 2, 3]
    results = process_item.expand(item=items)

This Airflow DAG defines a task that doubles a number, then runs that task in parallel for each number in the list.

Process Table

Step	Action	Input	Task Instance	Output	Notes
1	Define task function	None	process_item	Function ready	Task process_item is defined
2	Create input list	None	N/A	[1, 2, 3]	List of items to process
3	Map task over inputs	[1, 2, 3]	process_item[1]	Scheduled	Task instance for item=1 created
4	Map task over inputs	[1, 2, 3]	process_item[2]	Scheduled	Task instance for item=2 created
5	Map task over inputs	[1, 2, 3]	process_item[3]	Scheduled	Task instance for item=3 created
6	Run task instance	1	process_item[1]	2	Item 1 doubled to 2
7	Run task instance	2	process_item[2]	4	Item 2 doubled to 4
8	Run task instance	3	process_item[3]	6	Item 3 doubled to 6
9	Collect results	N/A	N/A	[2, 4, 6]	All mapped tasks completed
10	DAG complete	N/A	N/A	Success	All tasks finished successfully

💡 All mapped task instances finished, DAG run completes successfully

Status Tracker

Variable	Start	After Step 2	After Step 3-5	After Step 6-8	Final
items	None	[1, 2, 3]	[1, 2, 3]	[1, 2, 3]	[1, 2, 3]
results	None	None	None	[2, 4, 6]	[2, 4, 6]

Key Moments - 3 Insights

Why do we see multiple task instances with the same task name but different inputs?

How does Airflow know when all mapped tasks are done?

What happens if one mapped task instance fails?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution table, what is the output of process_item[2] at step 7?

DNone

Concept Snapshot

Mapped tasks let you run the same task code on many inputs at once.
Define a task function, create a list of inputs, then use expand() to map.
Airflow runs each mapped task instance in parallel.
All instances must finish for the DAG to succeed.
This speeds up processing by using parallelism.

Full Transcript

Mapped tasks in Airflow allow running the same task multiple times with different inputs in parallel. First, you define a base task function. Then, you create a list of inputs. Using the expand() method, Airflow creates separate task instances for each input. These run at the same time, speeding up the workflow. The DAG completes only after all mapped tasks finish successfully. This visual trace showed each step from defining the task, scheduling mapped instances, running them, and collecting results.