0
0
Apache Airflowdevops~10 mins

Airflow metrics with Prometheus - Step-by-Step Execution

Choose your learning style9 modes available
Process Flow - Airflow metrics with Prometheus
Airflow Scheduler runs tasks
Airflow exposes metrics endpoint
Prometheus scrapes metrics endpoint
Prometheus stores metrics data
User queries metrics or visualizes in Grafana
Airflow runs tasks and exposes metrics on an HTTP endpoint. Prometheus regularly collects these metrics to store and visualize them.
Execution Sample
Apache Airflow
from prometheus_client import start_http_server, Gauge

metrics_exporter = Gauge('airflow_task_status', 'Status of Airflow tasks')
start_http_server(8080)
# Airflow runs and updates metrics
# Prometheus scrapes http://localhost:8080/metrics
Starts a Prometheus metrics server in Airflow and shows how Prometheus scrapes metrics.
Process Table
StepActionAirflow StatePrometheus ActionResult
1Airflow Scheduler starts taskTask runningNo scrape yetMetrics updated internally
2Prometheus scrapes /metricsTask runningScrape metrics endpointMetrics data collected
3Task completesTask succeededNo scrapeMetrics reflect task success
4Prometheus scrapes /metrics againTask succeededScrape metrics endpointUpdated metrics collected
5User queries PrometheusN/AQuery metrics dataTask success metrics shown
6User visualizes in GrafanaN/AUse Prometheus dataDashboard shows Airflow metrics
💡 Process continues as Airflow runs tasks and Prometheus scrapes metrics periodically
Status Tracker
VariableStartAfter Step 1After Step 3After Step 4Final
Airflow Task StateNoneRunningSucceededSucceededSucceeded
Prometheus Metrics DataEmptyEmptyOld metricsUpdated metricsUpdated metrics
Key Moments - 3 Insights
Why doesn't Prometheus collect metrics before scraping?
Prometheus only collects metrics when it scrapes the Airflow metrics endpoint, so before scraping, no data is collected (see execution_table step 1).
How does Airflow update metrics when tasks change state?
Airflow updates internal metrics as task states change, but these updates are only visible to Prometheus after the next scrape (see steps 3 and 4).
Why do we need a metrics endpoint in Airflow?
The metrics endpoint exposes Airflow's internal metrics in a format Prometheus understands, enabling Prometheus to scrape and store them (see concept_flow).
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, what is the Airflow Task State after step 3?
AFailed
BRunning
CSucceeded
DNone
💡 Hint
Check the 'Airflow Task State' column after step 3 in variable_tracker
At which step does Prometheus first collect updated metrics reflecting task completion?
AStep 2
BStep 4
CStep 3
DStep 5
💡 Hint
Look at execution_table rows where Prometheus scrapes metrics after task completion
If Airflow did not expose a metrics endpoint, what would happen?
APrometheus would fail to collect metrics
BAirflow tasks would not run
CPrometheus would scrape metrics normally
DMetrics would be collected automatically
💡 Hint
Refer to key_moments about the importance of the metrics endpoint
Concept Snapshot
Airflow exposes task metrics via an HTTP endpoint.
Prometheus scrapes this endpoint regularly to collect metrics.
Metrics reflect task states like running or succeeded.
Prometheus stores and allows querying these metrics.
Visualize metrics with tools like Grafana.
Without the endpoint, Prometheus cannot collect metrics.
Full Transcript
Airflow runs tasks and updates internal metrics about their states. It exposes these metrics on a special HTTP endpoint. Prometheus regularly scrapes this endpoint to collect the latest metrics data. When a task starts, Airflow updates metrics internally, but Prometheus only sees these updates after scraping. When the task completes, Airflow updates metrics again, and Prometheus collects the updated data on the next scrape. Users can query this data in Prometheus or visualize it in Grafana dashboards. The key is that Airflow must expose a metrics endpoint for Prometheus to collect metrics. Without it, Prometheus cannot gather any data. This flow repeats continuously as Airflow runs tasks and Prometheus scrapes metrics.