How to Use Celery with Airflow: Setup and Example
To use
CeleryExecutor with Airflow, set the executor to CeleryExecutor in the airflow.cfg file and configure a message broker like Redis or RabbitMQ. This setup allows Airflow to distribute tasks across multiple worker nodes using Celery for asynchronous task execution.Syntax
To enable Celery with Airflow, update the airflow.cfg configuration file with the following key settings:
- executor: Set to
CeleryExecutorto enable Celery task distribution. - broker_url: URL of the message broker (e.g., Redis or RabbitMQ) Celery uses to send tasks.
- result_backend: Backend to store task results, often Redis or database.
This configuration allows Airflow's scheduler to send tasks to Celery workers asynchronously.
ini
[core] executor = CeleryExecutor [celery] broker_url = redis://localhost:6379/0 result_backend = db+sqlite:///airflow.db
Example
This example shows a simple Airflow DAG that runs a Python task distributed via CeleryExecutor. It assumes Redis is running locally as the broker.
python
from airflow import DAG from airflow.operators.python import PythonOperator from datetime import datetime def print_hello(): print('Hello from CeleryExecutor!') default_args = { 'start_date': datetime(2024, 1, 1), } dag = DAG('celery_example', default_args=default_args, schedule_interval='@once') task = PythonOperator( task_id='hello_task', python_callable=print_hello, dag=dag )
Output
Hello from CeleryExecutor!
Common Pitfalls
Common mistakes when using Celery with Airflow include:
- Not running a message broker like Redis or RabbitMQ before starting Airflow workers.
- Forgetting to start
airflow celery workerprocesses to consume tasks. - Misconfiguring
broker_urlorresult_backendcausing connection errors. - Using SQLite as
result_backendin production, which is not recommended.
Always verify broker connectivity and run workers separately from the scheduler.
bash
## Wrong: Not running Redis broker # airflow scheduler runs but tasks never execute ## Right: Start Redis and workers # redis-server & # airflow celery worker & # airflow scheduler &
Quick Reference
| Setting | Description | Example Value |
|---|---|---|
| executor | Defines Airflow executor type | CeleryExecutor |
| broker_url | Message broker URL for Celery | redis://localhost:6379/0 |
| result_backend | Backend to store task results | db+sqlite:///airflow.db |
| celery worker | Command to start Celery worker | airflow celery worker |
| scheduler | Command to start Airflow scheduler | airflow scheduler |
Key Takeaways
Set executor to CeleryExecutor and configure broker_url in airflow.cfg to use Celery with Airflow.
Run a message broker like Redis and start Celery workers to process tasks asynchronously.
Use a reliable result backend; avoid SQLite in production for Celery task results.
Always start both the scheduler and Celery workers to enable distributed task execution.
Check broker connectivity and worker logs to troubleshoot common Celery integration issues.