Mapped tasks for parallel processing in Apache Airflow - Time & Space Complexity
When using mapped tasks in Airflow, we want to know how the time to run tasks changes as we increase the number of items to process.
We ask: How does adding more items affect the total work done?
Analyze the time complexity of the following Airflow DAG snippet using mapped tasks.
from airflow import DAG
from airflow.decorators import task
from datetime import datetime
def process_item(item):
return item * 2
with DAG('mapped_task_dag', start_date=datetime(2024,1,1)) as dag:
@task
def multiply(item):
return process_item(item)
items = [1, 2, 3, 4, 5]
multiply.expand(item=items)
This code runs the process_item function on each item in the list in parallel using mapped tasks.
Look for repeated work inside the mapped task.
- Primary operation: Running
process_itemon each item. - How many times: Once per item in the input list.
Each new item adds one more task to run, so the total work grows directly with the number of items.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 tasks run |
| 100 | 100 tasks run |
| 1000 | 1000 tasks run |
Pattern observation: The work grows linearly as you add more items.
Time Complexity: O(n)
This means the total work increases in direct proportion to the number of items processed.
[X] Wrong: "Mapped tasks run all items instantly with no extra time as we add more items."
[OK] Correct: Each item still needs its own task to run, so more items mean more total work, even if tasks run in parallel.
Understanding how mapped tasks scale helps you explain how Airflow handles many parallel jobs efficiently, a useful skill for real-world workflows.
What if we changed the mapped task to process items sequentially inside one task? How would the time complexity change?