Airflow metrics with Prometheus - Time & Space Complexity
When Airflow sends metrics to Prometheus, it collects data from many tasks and DAGs. We want to understand how the time to gather these metrics grows as the number of tasks or DAGs increases.
How does the work needed to collect metrics change when we add more tasks?
Analyze the time complexity of this Airflow metrics collection snippet.
from prometheus_client import Gauge
def collect_task_metrics(dags):
task_gauge = Gauge('airflow_task_duration', 'Duration of Airflow tasks', ['dag_id', 'task_id'])
for dag in dags:
for task in dag.tasks:
duration = task.get_duration()
task_gauge.labels(dag_id=dag.dag_id, task_id=task.task_id).set(duration)
This code loops through all DAGs and their tasks to collect duration metrics for Prometheus.
Look at the loops that repeat work:
- Primary operation: Looping through each DAG and then each task inside it.
- How many times: For each DAG, it loops through all its tasks, so total loops equal the sum of all tasks in all DAGs.
As the number of DAGs and tasks grows, the work grows too:
| Input Size (total tasks) | Approx. Operations |
|---|---|
| 10 | About 10 metric sets |
| 100 | About 100 metric sets |
| 1000 | About 1000 metric sets |
Pattern observation: The work grows directly with the number of tasks. Double the tasks, double the work.
Time Complexity: O(n)
This means the time to collect metrics grows in a straight line with the number of tasks.
[X] Wrong: "Collecting metrics is always fast and does not depend on the number of tasks."
[OK] Correct: Each task adds work because the code loops through all tasks. More tasks mean more time needed.
Understanding how metric collection scales helps you design efficient monitoring in real projects. It shows you can think about how systems behave as they grow, a key skill in DevOps.
"What if we cached task durations instead of collecting them every time? How would the time complexity change?"