CeleryExecutor in Airflow: What It Is and How It Works
CeleryExecutor in Airflow is a way to run tasks in parallel across multiple worker machines using Celery, a distributed task queue. It helps Airflow scale by distributing task execution to many workers instead of running everything on one machine.How It Works
Imagine you have many chores to do, like washing dishes, vacuuming, and cooking. Doing them all alone takes a long time. Now imagine you have friends who can help you, each taking one chore. This is how CeleryExecutor works in Airflow. Instead of running all tasks on one computer, it sends tasks to many worker machines that do the work in parallel.
Under the hood, Airflow uses Celery, a tool that manages a queue of tasks and distributes them to workers. When a task is ready, Airflow puts it in a queue. Workers pick tasks from this queue and run them independently. This way, many tasks can run at the same time, speeding up workflows and handling bigger loads.
Example
This example shows how to configure Airflow to use CeleryExecutor and a simple DAG that runs tasks in parallel.
[core] executor = CeleryExecutor [celery] broker_url = redis://localhost:6379/0 result_backend = db+postgresql://user:password@localhost:5432/airflow --- from airflow import DAG from airflow.operators.bash import BashOperator from datetime import datetime default_args = { 'start_date': datetime(2024, 1, 1), } dag = DAG('parallel_tasks', default_args=default_args, schedule_interval=None) t1 = BashOperator(task_id='task1', bash_command='echo Task 1', dag=dag) t2 = BashOperator(task_id='task2', bash_command='echo Task 2', dag=dag) t3 = BashOperator(task_id='task3', bash_command='echo Task 3', dag=dag) # Tasks run in parallel [t1, t2, t3]
When to Use
Use CeleryExecutor when you need to run many tasks at the same time or when your tasks are too heavy for a single machine. It is great for teams or projects that want to scale Airflow across multiple servers.
For example, if you have data pipelines that process large datasets or many independent jobs, CeleryExecutor helps by spreading the work. It also supports retrying failed tasks and handling worker failures gracefully.
Key Points
- CeleryExecutor distributes tasks to multiple workers for parallel execution.
- It uses a message broker like Redis or RabbitMQ to manage task queues.
- It improves scalability and reliability of Airflow workflows.
- Requires setting up worker machines and a broker service.