Kubernetes executor for dynamic scaling in Apache Airflow - Time & Space Complexity
When Airflow uses the Kubernetes executor, it creates pods to run tasks dynamically. Understanding how the time to schedule and run tasks grows helps us see how well this scaling works.
We want to know: how does the time to handle tasks change as the number of tasks grows?
Analyze the time complexity of this Airflow Kubernetes executor snippet.
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime
def create_task(task_id, dag):
return BashOperator(
task_id=task_id,
bash_command='echo Hello',
dag=dag
)
dag = DAG('dynamic_scaling', start_date=datetime(2024, 1, 1), schedule_interval=None)
n = 100 # Number of tasks
tasks = [create_task(f'task_{i}', dag) for i in range(n)]
# In production, Airflow scheduler executes tasks using KubernetesExecutor,
# launching a pod per task.
This code dynamically defines n tasks in a DAG. With the Kubernetes executor, it launches a pod for each task.
Look at what repeats as tasks increase.
- Primary operation: Creating and executing each task, which involves launching a Kubernetes pod.
- How many times: Exactly
ntimes, once per task.
As the number of tasks n grows, the total time to schedule and run all tasks grows roughly in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 pod launches and executions |
| 100 | 100 pod launches and executions |
| 1000 | 1000 pod launches and executions |
Pattern observation: Doubling the number of tasks roughly doubles the total work done.
Time Complexity: O(n)
This means the total time to handle tasks grows linearly with the number of tasks.
[X] Wrong: "Launching many pods happens all at once instantly, so time does not grow with tasks."
[OK] Correct: Even though pods launch in parallel, there is overhead per pod for scheduling and starting, so total time still grows roughly with the number of tasks.
Understanding how dynamic scaling affects time helps you explain real-world system behavior clearly. This skill shows you can think about how systems grow and handle load.
What if we changed from launching one pod per task to batching multiple tasks in a single pod? How would the time complexity change?