Overview - Kubernetes executor for dynamic scaling

What is it?

The Kubernetes executor is a way for Apache Airflow to run tasks by creating pods in a Kubernetes cluster. It dynamically launches a new pod for each task, allowing tasks to run independently and scale automatically. This means Airflow can handle many tasks at once without pre-allocating resources. It helps manage workloads efficiently by using Kubernetes' power to add or remove pods as needed.

Why it matters

Without dynamic scaling, Airflow would need fixed resources, which can waste capacity or cause delays when many tasks run. The Kubernetes executor solves this by adjusting resources on the fly, saving costs and speeding up workflows. This flexibility is crucial for businesses that have changing workloads and want to use cloud resources efficiently. It makes Airflow more powerful and responsive to real-world demands.

Where it fits

Before learning this, you should understand basic Airflow concepts like DAGs and tasks, and have a basic idea of Kubernetes pods and clusters. After mastering the Kubernetes executor, you can explore advanced Airflow scaling strategies, Kubernetes operators, and cloud-native workflow orchestration.

Mental Model

Core Idea

The Kubernetes executor runs each Airflow task in its own Kubernetes pod, creating and removing pods dynamically to match workload demand.

Think of it like...

Imagine a restaurant kitchen where each dish (task) is cooked in its own small cooking station (pod). When an order comes in, a new station is set up quickly, cooks the dish, and then the station is cleaned and removed, freeing space for the next order.

Airflow Scheduler
    │
    ▼
┌───────────────┐
│ Kubernetes    │
│ Executor      │
└───────────────┘
    │ creates pods dynamically
    ▼
┌───────────────┐   ┌───────────────┐   ┌───────────────┐
│ Pod for Task1 │   │ Pod for Task2 │   │ Pod for Task3 │
└───────────────┘   └───────────────┘   └───────────────┘
    │ runs task        │ runs task        │ runs task
    ▼                  ▼                  ▼
Pods terminate after task completion, freeing resources.

Build-Up - 7 Steps

1

FoundationUnderstanding Airflow Task Execution

Concept: Learn how Airflow runs tasks and what executors do.

Airflow runs workflows made of tasks. Each task needs a place to run. Executors decide how and where tasks run. The default executor runs tasks on the same machine, which limits scaling.

Result

You understand that executors control task execution and that scaling is limited with default executors.

Knowing how executors work is key to understanding why Kubernetes executor improves scalability.

2

FoundationBasics of Kubernetes Pods and Clusters

3

IntermediateHow Kubernetes Executor Launches Pods

4

IntermediateConfiguring Kubernetes Executor in Airflow

5

IntermediateResource Management and Pod Templates

6

AdvancedDynamic Scaling and Autoscaling Integration

7

ExpertHandling Pod Lifecycle and Task Failures

Under the Hood

The Kubernetes executor acts as a bridge between Airflow's scheduler and the Kubernetes API. When a task is ready, the executor creates a pod spec with task details and submits it to Kubernetes. Kubernetes schedules the pod on a cluster node, runs the container, and reports status back. The executor monitors pod states and streams logs to Airflow. After task completion, the pod is deleted to free resources. This process repeats for each task, enabling parallelism and isolation.

Why designed this way?

This design leverages Kubernetes' native container orchestration strengths to solve Airflow's scaling limits. Instead of managing workers manually, Airflow delegates task execution to Kubernetes pods, which are lightweight and ephemeral. This avoids resource contention and simplifies scaling. Alternatives like fixed worker pools were less flexible and more costly. Kubernetes' API-driven model fits Airflow's dynamic task scheduling perfectly.

Airflow Scheduler
    │
    ▼
┌─────────────────────┐
│ Kubernetes Executor  │
│  (creates pod specs) │
└─────────────────────┘
    │
    ▼ Kubernetes API
┌─────────────────────┐
│ Kubernetes Cluster   │
│ ┌───────────────┐   │
│ │ Pod for Task  │◄──┤
│ └───────────────┘   │
│       Node          │
└─────────────────────┘
    │
    ▼
Pod runs task container
    │
    ▼
Executor monitors pod status and streams logs
    │
    ▼
Pod deleted after completion

Myth Busters - 4 Common Misconceptions

Quick: Does the Kubernetes executor run all tasks in a single pod? Commit yes or no.

Common Belief:The Kubernetes executor runs all Airflow tasks inside one shared pod to save resources.

Tap to reveal reality

Quick: Do you think Kubernetes executor automatically scales cluster nodes? Commit yes or no.

Common Belief:The Kubernetes executor automatically adds or removes cluster nodes as task demand changes.

Tap to reveal reality

Quick: Does the Kubernetes executor keep pods running after task failure for debugging? Commit yes or no.

Common Belief:Pods remain running after task failure to allow manual inspection and debugging.

Tap to reveal reality

Quick: Is the Kubernetes executor suitable for all Airflow workloads? Commit yes or no.

Common Belief:Kubernetes executor is always the best choice for any Airflow workload.

Tap to reveal reality

Expert Zone

1

Pod startup time can impact task latency; optimizing container images and init containers reduces delays.

2

Properly configuring resource requests and limits prevents pod eviction and ensures cluster stability under load.

3

Handling secrets and environment variables securely in pod templates is critical to avoid leaks in multi-tenant clusters.

When NOT to use

Avoid Kubernetes executor if your Airflow deployment is small, runs on a single machine, or if you lack Kubernetes expertise. Alternatives like LocalExecutor or CeleryExecutor are simpler and sufficient for low-scale workloads.

Production Patterns

In production, teams use pod templates with resource quotas, integrate with Kubernetes Cluster Autoscaler for node scaling, and configure centralized logging and monitoring. They also use namespaces and RBAC for multi-tenant security and often combine Kubernetes executor with Persistent Volumes for stateful tasks.

Connections

Serverless Computing

Both use dynamic resource allocation to run code on demand without fixed servers.

Understanding Kubernetes executor helps grasp serverless models where functions run in isolated containers triggered by events.

Container Orchestration

Kubernetes executor is a specific application of container orchestration for workflow tasks.

Knowing container orchestration principles clarifies how Airflow leverages Kubernetes to manage task lifecycle and scaling.

Just-in-Time Manufacturing

Both create resources only when needed to reduce waste and improve efficiency.

Seeing Kubernetes executor as just-in-time resource creation helps appreciate its cost and performance benefits.

Common Pitfalls

#1Not configuring Kubernetes access correctly causes Airflow to fail creating pods.

Wrong approach:[executor] executor = KubernetesExecutor # Missing or incorrect kubeconfig or in-cluster config # No permissions set for Airflow service account

Correct approach:[executor] executor = KubernetesExecutor [kubernetes] kube_config = /path/to/kubeconfig namespace = airflow # Ensure Airflow service account has pod creation permissions

Root cause:Misunderstanding that Airflow needs proper Kubernetes credentials and permissions to create pods.

#2Setting no resource limits leads to pods consuming excessive CPU or memory.

Wrong approach:pod_template.yaml: containers: - name: base image: airflow-task # No resources section

Correct approach:pod_template.yaml: containers: - name: base image: airflow-task resources: requests: cpu: "500m" memory: "512Mi" limits: cpu: "1" memory: "1Gi"

Root cause:Ignoring Kubernetes best practices for resource management causes unstable cluster behavior.

#3Assuming logs are stored in pods after deletion causes loss of task logs.

Wrong approach:No log persistence configured; relying on pod logs after pod deletion.

Correct approach:Configure Airflow logging to stream pod logs to centralized storage like Elasticsearch or S3 before pod deletion.

Root cause:Not understanding that pods are ephemeral and logs must be collected externally.

Key Takeaways

Kubernetes executor runs each Airflow task in its own pod, enabling true parallelism and isolation.

Dynamic pod creation allows Airflow to scale workloads efficiently without pre-allocating fixed resources.

Proper configuration of Kubernetes access, pod templates, and resource limits is essential for stable and secure operation.

Integration with Kubernetes autoscaling enables full dynamic scaling from tasks to cluster nodes.

Understanding pod lifecycle and logging is critical to avoid resource leaks and ensure task observability.