0
0
Apache Airflowdevops~10 mins

Cost optimization for cloud resources in Apache Airflow - Commands & Configuration

Choose your learning style9 modes available
Introduction
Cloud resources can become expensive if left running without control. Cost optimization helps you use only what you need and stop or reduce resources when they are not required, saving money.
When you want to automatically stop cloud servers during nights or weekends to save costs
When you need to scale down resources during low traffic periods to reduce bills
When you want to track and report cloud spending regularly to avoid surprises
When you want to clean up unused or idle cloud resources automatically
When you want to schedule resource usage only during business hours
Config File - cost_optimization_dag.py
cost_optimization_dag.py
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime, timedelta

def stop_idle_resources():
    print("Stopping idle cloud resources to save cost...")
    # Here you would add code to call cloud APIs to stop resources

def report_costs():
    print("Generating cloud cost report...")
    # Here you would add code to fetch and log cost data

def cleanup_unused_resources():
    print("Cleaning up unused cloud resources...")
    # Here you would add code to delete unused resources

with DAG(
    'cost_optimization',
    default_args={
        'owner': 'airflow',
        'depends_on_past': False,
        'start_date': datetime(2024, 1, 1),
        'retries': 1,
        'retry_delay': timedelta(minutes=5),
    },
    schedule_interval='0 22 * * *',  # Every day at 10 PM
    catchup=False
) as dag:

    stop_resources = PythonOperator(
        task_id='stop_idle_resources',
        python_callable=stop_idle_resources
    )

    generate_report = PythonOperator(
        task_id='report_costs',
        python_callable=report_costs
    )

    cleanup_resources = PythonOperator(
        task_id='cleanup_unused_resources',
        python_callable=cleanup_unused_resources
    )

    stop_resources >> generate_report >> cleanup_resources

This Airflow DAG runs daily at 10 PM to perform cost optimization tasks.

stop_idle_resources: Stops cloud resources that are idle to save money.

report_costs: Generates a report of cloud spending for monitoring.

cleanup_unused_resources: Deletes unused resources to avoid unnecessary charges.

The tasks run in sequence to first stop resources, then report costs, and finally clean up.

Commands
Lists all available DAGs in Airflow to verify the cost optimization DAG is recognized.
Terminal
airflow dags list
Expected OutputExpected
dags cost_optimization example_bash_operator example_python_operator
Manually triggers the cost optimization DAG to run the cost-saving tasks immediately.
Terminal
airflow dags trigger cost_optimization
Expected OutputExpected
Created <DagRun cost_optimization @ 2024-06-01T12:00:00+00:00: manual__2024-06-01T12:00:00+00:00, externally triggered: True>
Lists all tasks in the cost optimization DAG to see the steps involved.
Terminal
airflow tasks list cost_optimization
Expected OutputExpected
stop_idle_resources report_costs cleanup_unused_resources
Tests the stop_idle_resources task for the given date to check it runs correctly without executing the full DAG.
Terminal
airflow tasks test cost_optimization stop_idle_resources 2024-06-01
Expected OutputExpected
[2024-06-01 12:00:00,000] {taskinstance.py:876} INFO - Executing <Task(PythonOperator): stop_idle_resources> on 2024-06-01 Stopping idle cloud resources to save cost... [2024-06-01 12:00:00,100] {taskinstance.py:1020} INFO - Marking task as SUCCESS. dag_id=cost_optimization, task_id=stop_idle_resources, execution_date=2024-06-01 00:00:00+00:00, start_date=2024-06-01 12:00:00, end_date=2024-06-01 12:00:00
Key Concept

If you remember nothing else from this pattern, remember: automate stopping and cleaning cloud resources during off-hours to save money without manual effort.

Common Mistakes
Not setting the DAG schedule or start date correctly
The DAG will not run automatically or may run at unexpected times, missing cost-saving windows.
Set a clear start_date and schedule_interval in the DAG to run during off-hours.
Running tasks without proper cloud API permissions
Tasks to stop or delete resources will fail, so no cost savings happen.
Ensure Airflow has credentials with permissions to manage cloud resources.
Not verifying task success after triggering the DAG
You might think resources are stopped but tasks could have failed silently.
Check Airflow logs and task status after runs to confirm success.
Summary
Create an Airflow DAG that runs cost optimization tasks like stopping idle resources and cleaning unused ones.
Use PythonOperator tasks to define each step and schedule the DAG to run during off-hours.
Trigger the DAG manually or let it run on schedule, then verify task success to ensure cost savings.