0
0
Apache Airflowdevops~10 mins

Celery executor for distributed execution in Apache Airflow - Step-by-Step Execution

Choose your learning style9 modes available
Process Flow - Celery executor for distributed execution
Start Airflow Scheduler
Scheduler sends task to Celery Broker
Celery Broker queues task
Celery Worker picks task from Broker
Worker executes task
Worker sends task result back to Backend
Scheduler updates task status
The Airflow scheduler sends tasks to a message broker. Celery workers pick tasks from the broker, execute them, and send results back. The scheduler updates task status accordingly.
Execution Sample
Apache Airflow
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime

dag = DAG('example', start_date=datetime(2024,1,1))
t1 = BashOperator(task_id='print_date', bash_command='date', dag=dag)
Defines a simple Airflow DAG with one task that runs the 'date' command using Celery executor.
Process Table
StepComponentActionState ChangeOutput/Result
1SchedulerDetects new task 'print_date'Task queued to Celery brokerTask 'print_date' sent to broker
2Celery BrokerReceives task 'print_date'Task added to queueTask waiting in queue
3Celery WorkerFetches task 'print_date' from brokerTask removed from queue, execution startedWorker starts running 'date' command
4Celery WorkerExecutes 'date' commandTask runningOutputs current date/time
5Celery WorkerSends result back to backendTask marked as completedResult stored in Airflow metadata DB
6SchedulerUpdates task status to successTask state updatedTask 'print_date' marked SUCCESS
7SchedulerChecks for next tasksNo more tasksScheduler idle
💡 All tasks executed and marked complete; no pending tasks remain
Status Tracker
VariableStartAfter Step 1After Step 3After Step 5Final
Task StateNoneQueuedRunningCompletedSuccess
Broker QueueEmptyContains 'print_date'EmptyEmptyEmpty
Worker StatusIdleIdleBusy executing 'print_date'IdleIdle
Key Moments - 3 Insights
Why does the task state change from 'Queued' to 'Running' only after the worker picks it?
Because the scheduler only queues the task to the broker. The task state changes to 'Running' when a worker actually starts executing it, as shown in steps 2 and 3 of the execution table.
What happens if no worker picks the task from the broker?
The task stays in the broker queue in 'Queued' state and does not progress to 'Running' or 'Completed', as seen in step 2 where the broker holds the task until a worker fetches it.
How does the scheduler know when a task is completed?
The worker sends the result back to the backend, which updates the task status. The scheduler then reads this update and marks the task as 'Success', as shown in steps 5 and 6.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, at which step does the Celery worker start executing the task?
AStep 3
BStep 2
CStep 4
DStep 5
💡 Hint
Check the 'Action' column for when the worker fetches the task from the broker.
According to the variable tracker, what is the state of the broker queue after step 3?
AContains multiple tasks
BEmpty
CContains 'print_date'
DUnknown
💡 Hint
Look at the 'Broker Queue' row after 'After Step 3' column.
If the worker fails to send the result back, what will the scheduler's task state remain as?
ASuccess
BQueued
CRunning
DFailed
💡 Hint
Refer to the key moment about how the scheduler updates task status after receiving results.
Concept Snapshot
Celery executor lets Airflow run tasks on multiple workers.
Scheduler sends tasks to a message broker.
Workers pick tasks from the broker and run them.
Workers send results back to update task status.
This enables distributed, parallel task execution.
Full Transcript
This visual execution shows how Airflow uses the Celery executor for distributed task execution. The scheduler detects a new task and sends it to the Celery broker, which queues it. A Celery worker picks the task from the broker and starts executing it. After running the task, the worker sends the result back to the backend. The scheduler then updates the task status to success. Variables like task state, broker queue, and worker status change step-by-step to reflect this flow. Key moments clarify common confusions about task state changes and communication between components. The quiz tests understanding of when tasks move between states and how components interact.