0
0
Apache Airflowdevops~10 mins

Celery executor for distributed execution in Apache Airflow - Commands & Configuration

Choose your learning style9 modes available
Introduction
When you want to run many tasks in parallel across multiple machines, the Celery executor helps Airflow distribute the work efficiently. It uses a message queue to send tasks to worker machines that run them independently.
When your Airflow tasks take a long time and you want to speed up execution by running them on multiple servers.
When you need to scale your workflow execution beyond a single machine's capacity.
When you want to isolate task execution so that one slow or failing task does not block others.
When you want to add or remove worker machines dynamically without stopping the scheduler.
When you want to use a message broker like Redis or RabbitMQ to manage task distribution.
Config File - airflow.cfg
airflow.cfg
[core]
executor = CeleryExecutor

[celery]
broker_url = redis://localhost:6379/0
result_backend = db+sqlite:///airflow.db

[logging]
base_log_folder = /usr/local/airflow/logs

[database]
sql_alchemy_conn = sqlite:///airflow.db

The [core] section sets the executor to CeleryExecutor to enable distributed task execution.

The [celery] section configures the message broker URL (Redis here) and the backend to store task results.

The [logging] section defines where task logs are saved.

The [database] section sets the connection to the Airflow metadata database.

Commands
Initializes the Airflow metadata database to store task and workflow information.
Terminal
airflow db init
Expected OutputExpected
INFO [alembic.runtime.migration] Context impl SQLiteImpl. INFO [alembic.runtime.migration] Will assume non-transactional DDL. INFO [alembic.runtime.migration] Running upgrade -> head INFO [airflow.utils.db] Creating tables INFO [airflow.utils.db] Tables created successfully
Starts the Airflow scheduler which sends tasks to the Celery workers for execution.
Terminal
airflow scheduler
Expected OutputExpected
INFO [airflow.jobs.scheduler_job] Starting the scheduler INFO [airflow.executors.celery_executor] Using CeleryExecutor INFO [airflow.jobs.scheduler_job] Scheduler started
Starts a Celery worker process that listens for tasks from the scheduler and executes them.
Terminal
airflow celery worker
Expected OutputExpected
INFO [celery.worker.consumer] Connected to redis://localhost:6379/0 INFO [celery.worker.consumer] celery@worker ready.
Starts the Airflow webserver to monitor workflows and task status through a web interface.
Terminal
airflow webserver
Expected OutputExpected
INFO [airflow.www.app] Starting Flask app INFO [airflow.www.app] Running on http://0.0.0.0:8080/
Key Concept

If you remember nothing else from this pattern, remember: Celery executor uses a message broker to distribute tasks to multiple worker machines for parallel execution.

Common Mistakes
Not starting the Celery worker process after configuring the executor.
Without workers running, tasks will never be executed and remain queued indefinitely.
Always start one or more 'airflow celery worker' processes to handle tasks.
Using an incorrect or unreachable broker URL in the airflow.cfg file.
The scheduler and workers cannot communicate, so tasks fail to dispatch or execute.
Verify the broker URL is correct and the message broker service (e.g., Redis) is running.
Using SQLite as the metadata database in production with Celery executor.
SQLite does not support concurrent writes well, causing errors with multiple workers.
Use a production-ready database like PostgreSQL or MySQL for the Airflow metadata database.
Summary
Set 'executor = CeleryExecutor' in airflow.cfg to enable distributed task execution.
Start the Airflow scheduler to send tasks to workers.
Run one or more Celery worker processes to execute tasks.
Use a message broker like Redis to manage task distribution.
Start the Airflow webserver to monitor task progress and status.