0
0
Apache Airflowdevops~10 mins

Airflow architecture (scheduler, webserver, executor, metadata DB) - Step-by-Step Execution

Choose your learning style9 modes available
Process Flow - Airflow architecture (scheduler, webserver, executor, metadata DB)
User submits DAG
Webserver receives DAG
Scheduler scans DAGs
Scheduler triggers tasks
Executor runs tasks
Tasks update Metadata DB
Webserver reads Metadata DB for UI
This flow shows how Airflow components interact: user submits DAGs to the webserver, scheduler scans and triggers tasks, executor runs them, and metadata DB stores state for the webserver UI.
Execution Sample
Apache Airflow
# Simplified Airflow flow
webserver.start()
scheduler.scan_dags()
scheduler.trigger_tasks()
executor.run_tasks()
metadata_db.update_state()
webserver.show_ui()
This code simulates the main Airflow components working together to run workflows and update the UI.
Process Table
StepComponentActionResult/State Change
1UserSubmits DAG fileDAG file available to Webserver and Scheduler
2WebserverReceives DAG and serves UIDAG visible in Airflow UI
3SchedulerScans DAG filesIdentifies tasks to schedule
4SchedulerTriggers task instancesTasks queued for execution
5ExecutorRuns tasksTasks execute on workers
6TasksUpdate Metadata DBTask states updated (running, success, failed)
7WebserverReads Metadata DBUI shows updated task status
8SchedulerRepeats scanning and triggeringContinuous workflow management
9ExitNo new tasks to scheduleWaits for next cycle
💡 Scheduler waits when no tasks are ready; cycle repeats continuously
Status Tracker
VariableStartAfter Step 3After Step 4After Step 6Final
DAG filesEmptyLoadedLoadedLoadedLoaded
Task queueEmptyEmptyQueuedRunning/CompletedEmpty or updated
Metadata DBEmptyEmptyEmptyUpdated with task statesUpdated
UI displayEmptyShows DAGsShows DAGsShows running tasksShows updated task states
Key Moments - 3 Insights
Why does the scheduler scan DAG files repeatedly?
The scheduler scans DAG files continuously (see Step 3 and Step 8) to detect new or updated workflows and to trigger tasks as needed.
How does the webserver know the current status of tasks?
The webserver reads the Metadata DB (Step 7) where task states are updated by running tasks, so it can show accurate status in the UI.
What role does the executor play in task execution?
The executor runs the actual tasks on workers (Step 5), changing their state which is then recorded in the Metadata DB.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, at which step does the scheduler trigger tasks?
AStep 3
BStep 5
CStep 4
DStep 7
💡 Hint
Check the 'Action' column for when tasks are queued by the scheduler.
According to the variable tracker, what is the state of the task queue after Step 6?
ARunning/Completed
BEmpty
CQueued
DLoaded
💡 Hint
Look at the 'Task queue' row under 'After Step 6' in the variable tracker.
If the Metadata DB did not update after task execution, what would the webserver show?
ANo DAGs available
BOutdated or no task status
CUpdated task status
DScheduler status
💡 Hint
Refer to Step 7 where the webserver reads the Metadata DB for UI updates.
Concept Snapshot
Airflow architecture:
- Webserver: serves UI and DAGs
- Scheduler: scans DAGs, triggers tasks
- Executor: runs tasks on workers
- Metadata DB: stores task and DAG states
Scheduler and executor work continuously to manage workflows.
Full Transcript
Airflow architecture involves four main parts working together. The user submits DAG files which the webserver receives and shows in the UI. The scheduler scans these DAGs repeatedly to find tasks to run. When tasks are ready, the scheduler triggers them and the executor runs them on workers. As tasks run, they update their status in the metadata database. The webserver reads this database to show the current status of workflows and tasks. This cycle repeats continuously to manage workflows smoothly.