0
0
Apache Airflowdevops~10 mins

What is Apache Airflow - Visual Explanation

Choose your learning style9 modes available
Process Flow - What is Apache Airflow
Start Airflow Scheduler
Scheduler checks DAGs
Identify tasks to run
Trigger task execution
Workers run tasks
Tasks update status
Scheduler monitors progress
Repeat for next schedule
Airflow runs by scheduling workflows (DAGs), triggering tasks, and monitoring their execution continuously.
Execution Sample
Apache Airflow
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime

dag = DAG('hello_world', start_date=datetime(2024,1,1))
hello = BashOperator(task_id='say_hello', bash_command='echo Hello Airflow', dag=dag)
Defines a simple Airflow workflow that runs a bash command to print 'Hello Airflow'.
Process Table
StepActionEvaluationResult
1Scheduler starts and reads DAG fileDAG 'hello_world' foundDAG registered in Airflow
2Scheduler checks if task 'say_hello' should runStart date reachedTask scheduled for execution
3Worker picks up 'say_hello' taskRuns bash commandOutputs 'Hello Airflow'
4Task completesSuccess status recordedTask marked as done
5Scheduler waits for next runNo new tasks readyIdle until next schedule
💡 No more tasks to run at this time; scheduler waits for next scheduled run.
Status Tracker
VariableStartAfter Step 1After Step 2After Step 3After Step 4Final
DAG 'hello_world'Not loadedLoadedLoadedLoadedLoadedLoaded
Task 'say_hello' statusNoneNoneScheduledRunningSuccessSuccess
Key Moments - 2 Insights
Why does the scheduler keep running even after tasks complete?
The scheduler continuously monitors DAGs to trigger tasks at their scheduled times, as shown in step 5 where it waits for the next run.
What happens if the start_date of a DAG is in the future?
The scheduler will not schedule tasks until the start_date is reached, so no tasks run before that time (see step 2 evaluation).
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is the status of the task 'say_hello' after step 3?
AScheduled
BRunning
CSuccess
DNone
💡 Hint
Check the 'Task status' in variable_tracker after Step 3.
At which step does the scheduler mark the task as completed successfully?
AStep 4
BStep 3
CStep 2
DStep 5
💡 Hint
Look for 'Task completes' and 'Success status recorded' in execution_table.
If the DAG start_date was set to a future date, what would change in the execution table?
ATask would be scheduled immediately
BScheduler would skip loading the DAG
CTask would not be scheduled until start_date is reached
DTask would run twice
💡 Hint
Refer to step 2 evaluation about start_date condition.
Concept Snapshot
Apache Airflow is a tool to schedule and run workflows called DAGs.
It uses a scheduler to trigger tasks at set times.
Workers execute tasks and report status.
The scheduler keeps running to manage ongoing workflows.
DAGs define tasks and their order.
Start_date controls when tasks begin running.
Full Transcript
Apache Airflow is a system that helps you run and manage workflows automatically. It uses a scheduler that reads workflow definitions called DAGs. The scheduler checks when tasks should run and tells workers to execute them. Workers run the tasks and report back if they succeeded or failed. The scheduler keeps running to handle tasks as they become ready. A simple example is a DAG that runs a bash command to print 'Hello Airflow'. The scheduler loads this DAG, schedules the task when the start date arrives, the worker runs the command, and the task completes successfully. The scheduler then waits for the next scheduled run. This continuous cycle allows Airflow to manage complex workflows reliably.