Apache Airflowdevops~10 mins

Airflow architecture (scheduler, webserver, executor, metadata DB) - Step-by-Step Execution

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Process Flow - Airflow architecture (scheduler, webserver, executor, metadata DB)

User submits DAG

↓

Webserver receives DAG

↓

Scheduler scans DAGs

↓

Scheduler triggers tasks

↓

Executor runs tasks

↓

Tasks update Metadata DB

↓

Webserver reads Metadata DB for UI

This flow shows how Airflow components interact: user submits DAGs to the webserver, scheduler scans and triggers tasks, executor runs them, and metadata DB stores state for the webserver UI.

Execution Sample

Apache Airflow

# Simplified Airflow flow
webserver.start()
scheduler.scan_dags()
scheduler.trigger_tasks()
executor.run_tasks()
metadata_db.update_state()
webserver.show_ui()

This code simulates the main Airflow components working together to run workflows and update the UI.

Process Table

Step	Component	Action	Result/State Change
1	User	Submits DAG file	DAG file available to Webserver and Scheduler
2	Webserver	Receives DAG and serves UI	DAG visible in Airflow UI
3	Scheduler	Scans DAG files	Identifies tasks to schedule
4	Scheduler	Triggers task instances	Tasks queued for execution
5	Executor	Runs tasks	Tasks execute on workers
6	Tasks	Update Metadata DB	Task states updated (running, success, failed)
7	Webserver	Reads Metadata DB	UI shows updated task status
8	Scheduler	Repeats scanning and triggering	Continuous workflow management
9	Exit	No new tasks to schedule	Waits for next cycle

💡 Scheduler waits when no tasks are ready; cycle repeats continuously

Status Tracker

Variable	Start	After Step 3	After Step 4	After Step 6	Final
DAG files	Empty	Loaded	Loaded	Loaded	Loaded
Task queue	Empty	Empty	Queued	Running/Completed	Empty or updated
Metadata DB	Empty	Empty	Empty	Updated with task states	Updated
UI display	Empty	Shows DAGs	Shows DAGs	Shows running tasks	Shows updated task states

Key Moments - 3 Insights

Why does the scheduler scan DAG files repeatedly?

How does the webserver know the current status of tasks?

What role does the executor play in task execution?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution table, at which step does the scheduler trigger tasks?

AStep 3

BStep 5

CStep 4

DStep 7

Concept Snapshot

Airflow architecture:
- Webserver: serves UI and DAGs
- Scheduler: scans DAGs, triggers tasks
- Executor: runs tasks on workers
- Metadata DB: stores task and DAG states
Scheduler and executor work continuously to manage workflows.

Full Transcript

Airflow architecture involves four main parts working together. The user submits DAG files which the webserver receives and shows in the UI. The scheduler scans these DAGs repeatedly to find tasks to run. When tasks are ready, the scheduler triggers them and the executor runs them on workers. As tasks run, they update their status in the metadata database. The webserver reads this database to show the current status of workflows and tasks. This cycle repeats continuously to manage workflows smoothly.