Creating a basic DAG file in Apache Airflow - Performance & Efficiency
When we create a basic DAG file in Airflow, we want to know how the time it takes to run grows as we add more tasks.
We ask: How does adding more tasks affect the total work Airflow does?
Analyze the time complexity of the following code snippet.
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime
default_args = {'start_date': datetime(2024, 1, 1)}
dag = DAG('basic_dag', default_args=default_args, schedule_interval='@daily')
task1 = BashOperator(task_id='task1', bash_command='echo Hello', dag=dag)
task2 = BashOperator(task_id='task2', bash_command='echo World', dag=dag)
task1 >> task2
This code creates a DAG with two tasks that run simple shell commands in order.
- Primary operation: Defining tasks and their dependencies in the DAG.
- How many times: Each task is defined once; no loops or recursion in this snippet.
As you add more tasks, the number of operations grows roughly in a straight line.
| Input Size (n tasks) | Approx. Operations |
|---|---|
| 10 | 10 task definitions |
| 100 | 100 task definitions |
| 1000 | 1000 task definitions |
Pattern observation: Adding tasks increases work evenly, one by one.
Time Complexity: O(n)
This means the time to create the DAG grows directly with the number of tasks.
[X] Wrong: "Adding more tasks will make the DAG creation time grow much faster, like squared or exponential."
[OK] Correct: Each task is defined once without nested loops, so the work grows evenly, not faster.
Understanding how task count affects DAG creation helps you design workflows that scale well and run smoothly.
"What if we added nested loops to create tasks dynamically? How would the time complexity change?"