Manual triggers and parameters in Apache Airflow - Time & Space Complexity
When manually triggering Airflow DAGs with parameters, it is important to understand how the number of parameters affects execution time.
We want to know how the system handles more parameters and how that impacts the work done.
Analyze the time complexity of the following Airflow DAG trigger code snippet.
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime
def process_param(param):
print(f"Processing {param}")
def run(**kwargs):
params = kwargs.get('dag_run').conf.get('params', [])
for p in params:
process_param(p)
dag = DAG('manual_param_dag', start_date=datetime(2024,1,1))
run_task = PythonOperator(
task_id='run',
python_callable=run,
dag=dag
)
This code runs a task that processes each parameter passed when manually triggering the DAG.
Look for loops or repeated actions in the code.
- Primary operation: Loop over the list of parameters to process each one.
- How many times: Once for each parameter passed in the manual trigger.
As the number of parameters increases, the task runs the processing step more times.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 processing calls |
| 100 | 100 processing calls |
| 1000 | 1000 processing calls |
Pattern observation: The number of operations grows directly with the number of parameters.
Time Complexity: O(n)
This means the time to complete the task grows linearly with the number of parameters given.
[X] Wrong: "Adding more parameters won't affect the task's run time much."
[OK] Correct: Each parameter causes the task to do more work, so more parameters mean more time needed.
Understanding how input size affects execution helps you explain system behavior clearly and shows you can reason about scaling in real projects.
"What if the processing function itself called another loop over a list inside each parameter? How would that change the time complexity?"