0
0
Apache Airflowdevops~5 mins

SLA misses and notifications in Apache Airflow - Commands & Configuration

Choose your learning style9 modes available
Introduction
Sometimes tasks in Airflow take longer than expected. SLA misses help you know when this happens. Notifications alert you so you can fix problems quickly.
When you want to be alerted if a data pipeline task runs late.
When you need to monitor critical jobs that must finish on time.
When you want to send emails or messages if a task misses its deadline.
When you want to track SLA misses in the Airflow UI for reporting.
When you want to improve reliability by quickly reacting to delays.
Config File - sla_notification_dag.py
sla_notification_dag.py
from airflow import DAG
from airflow.operators.bash import BashOperator
from airflow.utils.dates import days_ago
from airflow.utils.email import send_email
from airflow.utils.session import provide_session
from datetime import timedelta

# Function to send email on SLA miss
@provide_session
def sla_miss_callback(dag, task_list, blocking_task_list, slas, session=None):
    subject = f"SLA Miss Alert for DAG: {dag.dag_id}"
    body = f"The following tasks missed their SLA: {', '.join(task_list)}"
    send_email(to=["alert@example.com"], subject=subject, html_content=body)

# Define the DAG
with DAG(
    dag_id='sla_notification_dag',
    start_date=days_ago(1),
    schedule_interval='@daily',
    sla_miss_callback=sla_miss_callback,
    catchup=False
) as dag:

    task1 = BashOperator(
        task_id='task1',
        bash_command='sleep 10',
        sla=timedelta(seconds=5)  # SLA set to 5 seconds
    )

    task2 = BashOperator(
        task_id='task2',
        bash_command='echo "Task 2 runs quickly"',
        sla=timedelta(seconds=10)  # SLA set to 10 seconds
    )

    task1 >> task2

This DAG defines two tasks with SLAs set using the sla parameter. If a task runs longer than its SLA, the sla_miss_callback function is called.

The callback sends an email alert listing the tasks that missed their SLA.

This helps notify the team immediately about delays.

Commands
This command starts the DAG named 'sla_notification_dag' to test SLA miss notifications.
Terminal
airflow dags trigger sla_notification_dag
Expected OutputExpected
{"dag_id": "sla_notification_dag", "run_id": "manual__2024-06-01T00:00:00+00:00", "state": "running"}
Lists all tasks in the DAG to verify task names and SLAs.
Terminal
airflow tasks list sla_notification_dag
Expected OutputExpected
task1 task2
Runs task1 manually for the given date to check if it respects the SLA or triggers a miss.
Terminal
airflow tasks test sla_notification_dag task1 2024-06-01
Expected OutputExpected
[2024-06-01 00:00:00,000] {bash.py:123} INFO - Running command: sleep 10 [2024-06-01 00:00:10,000] {bash.py:130} INFO - Command exited with return code 0
Runs task2 manually to verify it completes within SLA and does not trigger a miss.
Terminal
airflow tasks test sla_notification_dag task2 2024-06-01
Expected OutputExpected
[2024-06-01 00:00:00,000] {bash.py:123} INFO - Running command: echo "Task 2 runs quickly" Task 2 runs quickly [2024-06-01 00:00:00,100] {bash.py:130} INFO - Command exited with return code 0
Key Concept

If a task runs longer than its SLA time, Airflow calls the sla_miss_callback to notify you immediately.

Common Mistakes
Not setting the sla parameter on tasks.
Without SLA set, Airflow does not know when to trigger SLA miss notifications.
Always set the sla parameter with a timedelta on tasks you want to monitor.
Not defining sla_miss_callback in the DAG.
Without the callback, SLA misses are logged but no notifications are sent.
Define sla_miss_callback in the DAG to send alerts like emails or messages.
Setting SLA too short causing false alerts.
Tasks may often run longer than unrealistic SLAs, causing unnecessary notifications.
Set realistic SLA times based on typical task durations.
Summary
Set the sla parameter on tasks to define their maximum allowed run time.
Use sla_miss_callback in the DAG to send notifications when SLAs are missed.
Trigger and test tasks to verify SLA miss alerts work as expected.