Timetables for complex schedules in Apache Airflow - Time & Space Complexity
When working with complex schedules in Airflow, it is important to understand how the time to compute the next run times grows as the schedule gets more detailed.
We want to know how the effort to find the next execution time changes when schedules become more complex.
Analyze the time complexity of the following Airflow schedule code.
from airflow import DAG
from airflow.operators.dummy import DummyOperator
from airflow.utils.dates import days_ago
from airflow.timetables.base import DagRunInfo, DataInterval, Timetable
class ComplexTimetable(Timetable):
def next_dagrun_info(self, last_automated_data_interval=None):
# Simulate complex schedule calculation
for i in range(1000):
pass # complex logic here
yield DagRunInfo.interval(DataInterval(start=days_ago(1), end=days_ago(0)))
with DAG(dag_id='complex_schedule', start_date=days_ago(2), schedule_interval=None, timetable=ComplexTimetable()) as dag:
DummyOperator(task_id='start')
This code defines a custom timetable that simulates a complex schedule by running a loop before returning the next run interval.
- Primary operation: A loop running 1000 times inside the timetable method.
- How many times: The loop runs once each time Airflow calculates the next run time.
Imagine if the loop count increased with schedule complexity.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 operations |
| 100 | 100 operations |
| 1000 | 1000 operations |
Pattern observation: The operations grow directly with the input size, meaning more complexity means more work.
Time Complexity: O(n)
This means the time to compute the next run grows linearly with the complexity of the schedule.
[X] Wrong: "The timetable calculation always takes the same time no matter how complex the schedule is."
[OK] Correct: More complex schedules often require more steps to find the next run time, so the time grows with complexity.
Understanding how schedule complexity affects calculation time helps you design efficient workflows and shows you can think about scaling in real projects.
"What if the timetable used nested loops that depend on two input sizes? How would the time complexity change?"