Dynamic task generation with loops in Apache Airflow - Time & Space Complexity
When we create tasks dynamically in Airflow using loops, the number of tasks depends on the loop size.
We want to understand how the total work grows as we add more tasks.
Analyze the time complexity of the following code snippet.
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime
def generate_tasks(dag, task_count):
for i in range(task_count):
BashOperator(
task_id=f'task_{i}',
bash_command='echo Hello',
dag=dag
)
with DAG('example_dag', start_date=datetime(2024, 1, 1)) as dag:
generate_tasks(dag, 5)
This code creates a number of BashOperator tasks inside a DAG using a loop.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Loop creating tasks
- How many times: Equal to
task_counttimes
Each new task adds one more operation to create and register it.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 task creations |
| 100 | 100 task creations |
| 1000 | 1000 task creations |
Pattern observation: The work grows directly with the number of tasks.
Time Complexity: O(n)
This means the time to create tasks grows linearly as you add more tasks.
[X] Wrong: "Creating tasks in a loop is instant no matter how many tasks there are."
[OK] Correct: Each task creation takes some time, so more tasks mean more total time.
Understanding how loops affect task creation helps you explain how Airflow scales with many tasks.
"What if we nested loops to create tasks inside tasks? How would the time complexity change?"