SimpleHttpOperator for API calls in Apache Airflow - Time & Space Complexity
When using SimpleHttpOperator in Airflow, it's important to understand how the time to complete tasks grows as we make more API calls.
We want to know how the number of API requests affects the total execution time.
Analyze the time complexity of the following Airflow DAG snippet using SimpleHttpOperator.
from airflow import DAG
from airflow.providers.http.operators.http import SimpleHttpOperator
from datetime import datetime
default_args = {'start_date': datetime(2024, 1, 1)}
dag = DAG('api_call_dag', default_args=default_args, schedule_interval='@daily')
tasks = []
for i in range(5):
task = SimpleHttpOperator(
task_id=f'call_api_{i}',
method='GET',
http_conn_id='my_api',
endpoint=f'/data/{i}',
dag=dag
)
tasks.append(task)
if tasks:
for i in range(1, len(tasks)):
tasks[i-1] >> tasks[i]
This code creates 5 tasks that each make one API GET request to different endpoints.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Each SimpleHttpOperator task makes one HTTP GET request.
- How many times: The loop runs 5 times, creating 5 separate API calls.
As the number of API calls increases, the total execution time grows roughly in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 API calls |
| 100 | 100 API calls |
| 1000 | 1000 API calls |
Pattern observation: Doubling the number of API calls roughly doubles the total work done.
Time Complexity: O(n)
This means the total time grows linearly with the number of API calls made.
[X] Wrong: "Adding more API calls won't affect total execution time much because they run fast."
[OK] Correct: Each API call takes time, so more calls add up and increase total execution time linearly.
Understanding how task count affects execution time helps you design efficient workflows and estimate runtimes in real projects.
What if we changed the tasks to run in parallel instead of sequentially? How would the time complexity change?