Cost optimization for cloud resources in Apache Airflow - Time & Space Complexity
When managing cloud resources with Airflow, it's important to understand how the cost of running tasks grows as the number of resources or tasks increases.
We want to know how the time and cost scale when automating cloud resource management.
Analyze the time complexity of the following Airflow DAG task that checks and optimizes cloud resources.
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime
def optimize_resources(resource_list):
for resource in resource_list:
if resource.is_idle():
resource.shutdown()
dag = DAG('cloud_cost_optimization', start_date=datetime(2024, 1, 1), schedule_interval='@daily')
t1 = PythonOperator(
task_id='optimize',
python_callable=optimize_resources,
op_args=[['vm1', 'vm2', 'vm3', 'vm4']],
dag=dag
)
This code loops through a list of cloud resources and shuts down those that are idle to save cost.
Look at what repeats in the code.
- Primary operation: Looping through each resource in the list.
- How many times: Once for each resource in the input list.
As the number of resources grows, the time to check and optimize grows too.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 checks and possible shutdowns |
| 100 | 100 checks and possible shutdowns |
| 1000 | 1000 checks and possible shutdowns |
Pattern observation: The work grows directly with the number of resources.
Time Complexity: O(n)
This means the time to optimize grows in a straight line as you add more resources.
[X] Wrong: "Checking all resources happens instantly no matter how many there are."
[OK] Correct: Each resource needs to be checked one by one, so more resources mean more time.
Understanding how task time grows with resource count helps you design efficient cloud workflows and shows you can think about scaling in real projects.
"What if the optimize_resources function used parallel tasks for each resource? How would the time complexity change?"