0
0
Apache Airflowdevops~10 mins

Kubernetes executor for dynamic scaling in Apache Airflow - Step-by-Step Execution

Choose your learning style9 modes available
Process Flow - Kubernetes executor for dynamic scaling
Airflow Scheduler receives task
Scheduler sends task to Kubernetes Executor
Kubernetes Executor creates a Pod for the task
Pod runs the task
Task completes, Pod terminates
Kubernetes Executor reports task status back to Scheduler
Scheduler updates task state
If more tasks, repeat Pod creation
Kubernetes cluster scales Pods dynamically based on demand
The Kubernetes executor creates a new Pod for each task dynamically, runs the task inside it, and scales Pods up or down based on workload.
Execution Sample
Apache Airflow
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime

dag = DAG('example', start_date=datetime(2024,1,1))
t1 = BashOperator(task_id='print_date', bash_command='date', dag=dag)
This Airflow DAG defines a simple task that prints the date, which Kubernetes executor will run in a separate Pod.
Process Table
StepActionKubernetes Executor BehaviorPod StateScheduler State
1Scheduler receives task 'print_date'No Pod yetNo PodTask queued
2Scheduler sends task to Kubernetes ExecutorCreates Pod 'print_date-pod'Pod created, PendingTask running
3Pod starts running taskPod status changes to RunningRunningTask running
4Task executes 'date' command inside PodPod runs commandRunningTask running
5Task completes successfullyPod status changes to SucceededSucceededTask success
6Pod terminates and is cleaned upPod deletedNo PodTask success
7Scheduler ready for next taskWaits for next taskNo PodIdle
💡 No more tasks to run, Kubernetes executor scales down Pods to zero.
Status Tracker
VariableStartAfter Step 2After Step 3After Step 5Final
Pod StateNonePendingRunningSucceededDeleted
Task StateQueuedRunningRunningSuccessSuccess
Key Moments - 3 Insights
Why does Kubernetes executor create a new Pod for each task instead of reusing one?
Each task runs in its own Pod to isolate execution and allow dynamic scaling. See execution_table rows 2 and 3 where a new Pod is created and started for the task.
What happens to the Pod after the task completes?
The Pod status changes to 'Succeeded' and then the Pod is deleted to free resources, as shown in execution_table rows 5 and 6.
How does Kubernetes executor handle multiple tasks arriving at once?
It creates multiple Pods dynamically, scaling up the cluster as needed. This is implied in the flow where each task triggers Pod creation (concept_flow).
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, what is the Pod state at Step 3?
APending
BRunning
CSucceeded
DDeleted
💡 Hint
Check the 'Pod State' column at Step 3 in the execution_table.
At which step does the task state change to 'Success'?
AStep 4
BStep 6
CStep 5
DStep 7
💡 Hint
Look at the 'Task State' column in variable_tracker after Step 5.
If a new task arrives while a Pod is running, what does Kubernetes executor do?
ACreates a new Pod for the new task
BReuses the existing Pod
CQueues the task until Pod finishes
DFails the new task
💡 Hint
Refer to concept_flow where each task triggers Pod creation.
Concept Snapshot
Kubernetes executor runs each Airflow task in its own Pod.
Pods are created dynamically when tasks start and deleted after completion.
This allows Airflow to scale tasks up or down based on workload.
Scheduler sends tasks to executor, executor manages Pods.
Pods isolate tasks and free resources when done.
Full Transcript
The Kubernetes executor in Airflow works by creating a new Pod for each task it receives from the scheduler. When the scheduler sends a task, the executor creates a Pod in the Kubernetes cluster to run that task. The Pod starts in a Pending state, then moves to Running while executing the task command. Once the task finishes successfully, the Pod status changes to Succeeded and the Pod is deleted to free resources. The scheduler updates the task state accordingly. This dynamic creation and deletion of Pods allows Airflow to scale the number of running tasks up or down based on demand, efficiently using cluster resources.