Task failure callbacks in Apache Airflow - Time & Space Complexity
When Airflow runs tasks, it can call special functions if a task fails. Understanding how often these callbacks run helps us see how the system behaves as tasks increase.
We want to know how the work done by failure callbacks grows as the number of tasks grows.
Analyze the time complexity of the following Airflow DAG snippet with failure callbacks.
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime
def on_failure_callback(context):
print(f"Task {context['task_instance'].task_id} failed.")
def task_function():
pass
dag = DAG('example_failure_callback', start_date=datetime(2024, 1, 1))
task1 = PythonOperator(
task_id='task1',
python_callable=task_function,
on_failure_callback=on_failure_callback,
dag=dag
)
task2 = PythonOperator(
task_id='task2',
python_callable=task_function,
on_failure_callback=on_failure_callback,
dag=dag
)
This code sets up two tasks in a DAG, each with a failure callback that runs if the task fails.
Look at what repeats when tasks run and fail.
- Primary operation: The failure callback function runs once per failed task.
- How many times: At most once per task failure, so up to the number of tasks.
As the number of tasks increases, the number of failure callbacks that might run grows in a straight line.
| Input Size (n tasks) | Approx. Failure Callback Runs |
|---|---|
| 10 | Up to 10 |
| 100 | Up to 100 |
| 1000 | Up to 1000 |
Pattern observation: The number of failure callbacks grows directly with the number of tasks.
Time Complexity: O(n)
This means the work done by failure callbacks grows linearly with the number of tasks that fail.
[X] Wrong: "Failure callbacks run once for the whole DAG regardless of how many tasks fail."
[OK] Correct: Each task can fail independently, so each failure triggers its own callback. The callbacks add up with more failing tasks.
Knowing how failure callbacks scale helps you design reliable workflows. It shows you how extra work grows as your DAG grows, which is useful for real projects and discussions.
"What if the failure callback triggers another task that also has a failure callback? How would the time complexity change?"