0
0
AirflowHow-ToBeginner · 4 min read

How to Use Celery with Airflow: Setup and Example

To use CeleryExecutor with Airflow, set the executor to CeleryExecutor in the airflow.cfg file and configure a message broker like Redis or RabbitMQ. This setup allows Airflow to distribute tasks across multiple worker nodes using Celery for asynchronous task execution.
📐

Syntax

To enable Celery with Airflow, update the airflow.cfg configuration file with the following key settings:

  • executor: Set to CeleryExecutor to enable Celery task distribution.
  • broker_url: URL of the message broker (e.g., Redis or RabbitMQ) Celery uses to send tasks.
  • result_backend: Backend to store task results, often Redis or database.

This configuration allows Airflow's scheduler to send tasks to Celery workers asynchronously.

ini
[core]
executor = CeleryExecutor

[celery]
broker_url = redis://localhost:6379/0
result_backend = db+sqlite:///airflow.db
💻

Example

This example shows a simple Airflow DAG that runs a Python task distributed via CeleryExecutor. It assumes Redis is running locally as the broker.

python
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime

def print_hello():
    print('Hello from CeleryExecutor!')

default_args = {
    'start_date': datetime(2024, 1, 1),
}

dag = DAG('celery_example', default_args=default_args, schedule_interval='@once')

task = PythonOperator(
    task_id='hello_task',
    python_callable=print_hello,
    dag=dag
)
Output
Hello from CeleryExecutor!
⚠️

Common Pitfalls

Common mistakes when using Celery with Airflow include:

  • Not running a message broker like Redis or RabbitMQ before starting Airflow workers.
  • Forgetting to start airflow celery worker processes to consume tasks.
  • Misconfiguring broker_url or result_backend causing connection errors.
  • Using SQLite as result_backend in production, which is not recommended.

Always verify broker connectivity and run workers separately from the scheduler.

bash
## Wrong: Not running Redis broker
# airflow scheduler runs but tasks never execute

## Right: Start Redis and workers
# redis-server &
# airflow celery worker &
# airflow scheduler &
📊

Quick Reference

SettingDescriptionExample Value
executorDefines Airflow executor typeCeleryExecutor
broker_urlMessage broker URL for Celeryredis://localhost:6379/0
result_backendBackend to store task resultsdb+sqlite:///airflow.db
celery workerCommand to start Celery workerairflow celery worker
schedulerCommand to start Airflow schedulerairflow scheduler

Key Takeaways

Set executor to CeleryExecutor and configure broker_url in airflow.cfg to use Celery with Airflow.
Run a message broker like Redis and start Celery workers to process tasks asynchronously.
Use a reliable result backend; avoid SQLite in production for Celery task results.
Always start both the scheduler and Celery workers to enable distributed task execution.
Check broker connectivity and worker logs to troubleshoot common Celery integration issues.