0
0
Apache Airflowdevops~10 mins

DAG performance tracking in Apache Airflow - Step-by-Step Execution

Choose your learning style9 modes available
Process Flow - DAG performance tracking
DAG starts execution
Record start time
Run each task in DAG
Record task start and end times
Calculate task durations
Aggregate task durations
Calculate total DAG run time
Store performance metrics
Visualize or alert on performance
This flow shows how Airflow tracks the start and end times of each task and the entire DAG to measure performance.
Execution Sample
Apache Airflow
from airflow.models import DagRun
from airflow.utils.state import State

# Fetch DAG run
dag_run = DagRun.find(dag_id='example_dag')[0]

# Calculate duration
run_duration = dag_run.end_date - dag_run.start_date
This code fetches a DAG run and calculates its total duration by subtracting start time from end time.
Process Table
StepActionTaskStart TimeEnd TimeDurationDAG Total Duration
1DAG run startedN/A10:00:00
2Task A startedTask A10:00:05
3Task A endedTask A10:00:1510 seconds
4Task B startedTask B10:00:16
5Task B endedTask B10:00:3014 seconds
6Task C startedTask C10:00:31
7Task C endedTask C10:00:4514 seconds
8DAG run endedN/A10:00:5050 seconds
9Performance metrics storedN/A50 seconds
💡 DAG run ends at 10:00:50, total duration 50 seconds calculated and stored.
Status Tracker
VariableStartAfter Step 3After Step 5After Step 7Final
Task A Duration010 seconds10 seconds10 seconds10 seconds
Task B Duration0014 seconds14 seconds14 seconds
Task C Duration00014 seconds14 seconds
DAG Total Duration000050 seconds
Key Moments - 3 Insights
Why does the DAG total duration include gaps between tasks?
The DAG total duration measures from the first task start to the last task end, including any waiting or scheduling delays between tasks, as shown in steps 3 to 8.
How is each task's duration calculated?
Each task's duration is the difference between its start and end times, for example, Task A duration is 10 seconds from step 2 to 3.
Why do we store performance metrics after the DAG run ends?
Storing metrics after the DAG completes (step 9) allows tracking historical performance and detecting slowdowns or failures over time.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is the duration of Task B at step 5?
A14 seconds
B10 seconds
C16 seconds
D30 seconds
💡 Hint
Check the 'Duration' column for Task B at step 5 in the execution table.
At which step does the DAG run end according to the execution table?
AStep 7
BStep 9
CStep 8
DStep 6
💡 Hint
Look for the row with action 'DAG run ended' in the execution table.
If Task C took 20 seconds instead of 14, how would the DAG total duration change?
AIt would remain 50 seconds
BIt would increase to 56 seconds plus any gaps
CIt would increase to 56 seconds
DIt would decrease
💡 Hint
Consider that DAG total duration includes all task durations plus any waiting time between tasks.
Concept Snapshot
DAG Performance Tracking in Airflow:
- Track start and end times of each task
- Calculate task durations (end - start)
- Aggregate durations for total DAG run time
- Store metrics for monitoring
- Use Airflow metadata DB or logs
- Helps identify slow tasks and bottlenecks
Full Transcript
DAG performance tracking in Airflow involves recording the start and end times of each task and the entire DAG run. This allows calculation of individual task durations and the total time the DAG took to complete. The process starts when the DAG run begins, recording the start time. Each task's start and end times are tracked to compute durations. After all tasks finish, the total DAG duration is calculated from the first task start to the last task end. These performance metrics are stored for monitoring and analysis. This helps detect slow tasks and optimize workflows.