Trigger rules (all_success, one_success, none_failed) in Apache Airflow - Time & Space Complexity
When using trigger rules in Airflow, it's important to understand how the number of tasks affects the time it takes to decide if a task should run.
We want to know how the checking time grows as more tasks complete before the next task runs.
Analyze the time complexity of this Airflow task trigger rule check.
from airflow import DAG
from airflow.operators.dummy import DummyOperator
from airflow.utils.trigger_rule import TriggerRule
from datetime import datetime
dag = DAG('example_trigger_rule', start_date=datetime(2024, 1, 1))
start = DummyOperator(task_id='start', dag=dag)
check = DummyOperator(
task_id='check',
dag=dag,
trigger_rule=TriggerRule.ALL_SUCCESS
)
start >> check
This code sets a task with a trigger rule that waits for all upstream tasks to succeed before running.
When Airflow decides if the task can run, it checks the status of all upstream tasks.
- Primary operation: Checking each upstream task's state.
- How many times: Once per upstream task, so as many times as there are upstream tasks.
As the number of upstream tasks increases, the time to check their states grows proportionally.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 checks |
| 100 | 100 checks |
| 1000 | 1000 checks |
Pattern observation: The checking time grows linearly with the number of upstream tasks.
Time Complexity: O(n)
This means the time to decide if a task runs grows directly with the number of upstream tasks it depends on.
[X] Wrong: "The trigger rule check happens instantly no matter how many tasks there are."
[OK] Correct: Each upstream task's state must be checked, so more tasks mean more checks and more time.
Understanding how trigger rules scale helps you design workflows that run efficiently and predictably as they grow.
What if the trigger rule was changed to one_success instead of all_success? How would the time complexity change?