Task dependencies (>> and << operators) in Apache Airflow - Time & Space Complexity
When we set task dependencies in Airflow using the >> and << operators, we create relationships between tasks.
We want to understand how the time to set these dependencies grows as we add more tasks.
Analyze the time complexity of setting dependencies between tasks using >> operator.
from airflow import DAG
from airflow.operators.dummy import DummyOperator
from datetime import datetime
dag = DAG('example_dag', start_date=datetime(2023, 1, 1))
task1 = DummyOperator(task_id='task1', dag=dag)
task2 = DummyOperator(task_id='task2', dag=dag)
task3 = DummyOperator(task_id='task3', dag=dag)
# Set dependencies
task1 >> task2 >> task3
This code creates three tasks and sets dependencies so task1 runs before task2, which runs before task3.
Look for repeated steps in setting dependencies.
- Primary operation: Linking one task to the next using the >> operator.
- How many times: Once per pair of tasks connected in the chain.
As the number of tasks grows, the number of dependency links grows too.
| Input Size (n tasks) | Approx. Operations (links) |
|---|---|
| 10 | 9 |
| 100 | 99 |
| 1000 | 999 |
Pattern observation: The number of operations grows roughly one less than the number of tasks, increasing linearly.
Time Complexity: O(n)
This means the time to set dependencies grows in a straight line as you add more tasks.
[X] Wrong: "Setting dependencies between tasks takes the same time no matter how many tasks there are."
[OK] Correct: Each dependency link is a separate step, so more tasks mean more links and more time.
Understanding how task dependencies scale helps you design workflows that run efficiently and avoid bottlenecks.
"What if we set dependencies between every task and every other task (all-to-all)? How would the time complexity change?"