0
0
Apache Airflowdevops~7 mins

Kubernetes executor for dynamic scaling in Apache Airflow - Commands & Configuration

Choose your learning style9 modes available
Introduction
Running many tasks in Airflow can slow down your system if all tasks run on one machine. The Kubernetes executor helps by creating a new small computer (pod) for each task. This way, tasks run separately and the system can grow or shrink automatically based on the number of tasks.
When you have many Airflow tasks that need to run at the same time without slowing each other down
When you want Airflow to automatically add or remove resources based on workload
When you want to isolate tasks so one task's failure does not affect others
When you want to run Airflow tasks in a cloud or container environment easily
When you want to save costs by only using resources when tasks are running
Config File - airflow.cfg
airflow.cfg
[core]
executor = KubernetesExecutor

[kubernetes]
namespace = airflow
worker_container_repository = apache/airflow
worker_container_tag = 2.7.1

[kubernetes_executor]
worker_pods_creation_batch_size = 5

[logging]
remote_logging = False

[scheduler]
job_heartbeat_sec = 5
scheduler_heartbeat_sec = 5

[webserver]
web_server_port = 8080

This configuration file sets Airflow to use the Kubernetes executor under the [core] section. The [kubernetes] section defines the Kubernetes namespace and the Airflow worker image to use for task pods. The [kubernetes_executor] section controls how many pods can be created at once. Other sections configure logging, scheduler, and webserver settings.

Commands
Create a Kubernetes namespace called 'airflow' to isolate Airflow resources from other applications.
Terminal
kubectl create namespace airflow
Expected OutputExpected
namespace/airflow created
Apply the example Kubernetes pod specification for Airflow workers to ensure pods run with correct settings.
Terminal
kubectl apply -f https://airflow.apache.org/docs/apache-airflow/stable/kubernetes.html#example-kubernetes-pod-spec
Expected OutputExpected
pod/airflow-worker created
Start the Airflow scheduler which will use the Kubernetes executor to launch task pods dynamically.
Terminal
airflow scheduler
Expected OutputExpected
[2024-06-01 12:00:00,000] {scheduler_job.py:125} INFO - Starting the scheduler [2024-06-01 12:00:05,000] {kubernetes_executor.py:200} INFO - KubernetesExecutor started
Check the status of Airflow worker pods running in the 'airflow' namespace to see dynamic scaling in action.
Terminal
kubectl get pods -n airflow
Expected OutputExpected
NAME READY STATUS RESTARTS AGE airflow-worker-abc123 1/1 Running 0 30s airflow-worker-def456 1/1 Running 0 25s
-n airflow - Specifies the namespace to look for pods
Key Concept

If you remember nothing else from this pattern, remember: Kubernetes executor lets Airflow create a new pod for each task, so tasks run separately and scale automatically.

Common Mistakes
Not creating the Kubernetes namespace before starting Airflow
Airflow tries to create pods in a namespace that does not exist, causing pod creation failures.
Always run 'kubectl create namespace airflow' before starting Airflow with Kubernetes executor.
Using the wrong Airflow worker image tag in the configuration
The worker pods may fail to start or behave unexpectedly if the image tag does not match the Airflow version.
Set 'worker_container_tag' in airflow.cfg to the exact Airflow version you are running, e.g., '2.7.1'.
Not checking pod status with 'kubectl get pods' to verify dynamic scaling
You may think tasks are running but pods could be stuck or failing without visible errors.
Regularly run 'kubectl get pods -n airflow' to monitor pod creation and status.
Summary
Set Airflow executor to KubernetesExecutor in airflow.cfg to enable dynamic task pod creation.
Create a Kubernetes namespace to isolate Airflow pods and resources.
Start the Airflow scheduler to launch tasks as separate pods that scale automatically.
Use 'kubectl get pods -n airflow' to monitor running task pods and verify scaling.