0
0
AirflowConceptBeginner · 4 min read

CeleryExecutor in Airflow: What It Is and How It Works

The CeleryExecutor in Airflow is a way to run tasks in parallel across multiple worker machines using Celery, a distributed task queue. It helps Airflow scale by distributing task execution to many workers instead of running everything on one machine.
⚙️

How It Works

Imagine you have many chores to do, like washing dishes, vacuuming, and cooking. Doing them all alone takes a long time. Now imagine you have friends who can help you, each taking one chore. This is how CeleryExecutor works in Airflow. Instead of running all tasks on one computer, it sends tasks to many worker machines that do the work in parallel.

Under the hood, Airflow uses Celery, a tool that manages a queue of tasks and distributes them to workers. When a task is ready, Airflow puts it in a queue. Workers pick tasks from this queue and run them independently. This way, many tasks can run at the same time, speeding up workflows and handling bigger loads.

💻

Example

This example shows how to configure Airflow to use CeleryExecutor and a simple DAG that runs tasks in parallel.

ini and python
[core]
executor = CeleryExecutor

[celery]
broker_url = redis://localhost:6379/0
result_backend = db+postgresql://user:password@localhost:5432/airflow

---

from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime

default_args = {
    'start_date': datetime(2024, 1, 1),
}

dag = DAG('parallel_tasks', default_args=default_args, schedule_interval=None)

t1 = BashOperator(task_id='task1', bash_command='echo Task 1', dag=dag)
t2 = BashOperator(task_id='task2', bash_command='echo Task 2', dag=dag)
t3 = BashOperator(task_id='task3', bash_command='echo Task 3', dag=dag)

# Tasks run in parallel
[t1, t2, t3]
Output
Task 1 Task 2 Task 3
🎯

When to Use

Use CeleryExecutor when you need to run many tasks at the same time or when your tasks are too heavy for a single machine. It is great for teams or projects that want to scale Airflow across multiple servers.

For example, if you have data pipelines that process large datasets or many independent jobs, CeleryExecutor helps by spreading the work. It also supports retrying failed tasks and handling worker failures gracefully.

Key Points

  • CeleryExecutor distributes tasks to multiple workers for parallel execution.
  • It uses a message broker like Redis or RabbitMQ to manage task queues.
  • It improves scalability and reliability of Airflow workflows.
  • Requires setting up worker machines and a broker service.

Key Takeaways

CeleryExecutor lets Airflow run tasks on many workers at the same time for better speed and scale.
It uses Celery with a message broker like Redis to manage and distribute tasks.
Ideal for large or complex workflows that need parallel processing across multiple machines.
Requires configuring Airflow, workers, and a broker service to work properly.