0
0
Apache Airflowdevops~5 mins

Mapped tasks for parallel processing in Apache Airflow - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Mapped tasks for parallel processing
O(n)
Understanding Time Complexity

When using mapped tasks in Airflow, we want to know how the time to run tasks changes as we increase the number of items to process.

We ask: How does adding more items affect the total work done?

Scenario Under Consideration

Analyze the time complexity of the following Airflow DAG snippet using mapped tasks.

from airflow import DAG
from airflow.decorators import task
from datetime import datetime

def process_item(item):
    return item * 2

with DAG('mapped_task_dag', start_date=datetime(2024,1,1)) as dag:
    @task
    def multiply(item):
        return process_item(item)

    items = [1, 2, 3, 4, 5]
    multiply.expand(item=items)

This code runs the process_item function on each item in the list in parallel using mapped tasks.

Identify Repeating Operations

Look for repeated work inside the mapped task.

  • Primary operation: Running process_item on each item.
  • How many times: Once per item in the input list.
How Execution Grows With Input

Each new item adds one more task to run, so the total work grows directly with the number of items.

Input Size (n)Approx. Operations
1010 tasks run
100100 tasks run
10001000 tasks run

Pattern observation: The work grows linearly as you add more items.

Final Time Complexity

Time Complexity: O(n)

This means the total work increases in direct proportion to the number of items processed.

Common Mistake

[X] Wrong: "Mapped tasks run all items instantly with no extra time as we add more items."

[OK] Correct: Each item still needs its own task to run, so more items mean more total work, even if tasks run in parallel.

Interview Connect

Understanding how mapped tasks scale helps you explain how Airflow handles many parallel jobs efficiently, a useful skill for real-world workflows.

Self-Check

What if we changed the mapped task to process items sequentially inside one task? How would the time complexity change?