Unit testing DAGs in Apache Airflow - Time & Space Complexity
When we test Airflow DAGs, we want to know how the time to run tests changes as the DAG grows.
We ask: How does adding more tasks affect the test time?
Analyze the time complexity of the following unit test for an Airflow DAG.
from airflow.models import DagBag
import unittest
class TestMyDAG(unittest.TestCase):
def test_dag_loads(self):
dagbag = DagBag()
dag = dagbag.get_dag('my_dag')
self.assertIsNotNone(dag)
self.assertFalse(dagbag.import_errors)
This test loads the DAGs and checks if 'my_dag' loads without errors.
Look for repeated steps in the test process.
- Primary operation: Loading all DAG files in the DagBag.
- How many times: Once per test run, but loading involves reading each DAG file.
As the number of DAG files increases, loading takes longer.
| Input Size (number of DAG files) | Approx. Operations (loading DAGs) |
|---|---|
| 10 | 10 file reads and parses |
| 100 | 100 file reads and parses |
| 1000 | 1000 file reads and parses |
Pattern observation: The time grows roughly in direct proportion to the number of DAG files.
Time Complexity: O(n)
This means test time grows linearly as the number of DAG files increases.
[X] Wrong: "Unit testing a DAG always takes constant time regardless of DAG size."
[OK] Correct: Loading the DAGs requires reading each file, so more DAGs mean more work and longer test time.
Understanding how test time grows helps you write efficient tests and manage large Airflow projects confidently.
"What if we only loaded a single DAG file instead of all DAGs? How would the time complexity change?"