Integration with PagerDuty and Slack in Apache Airflow - Time & Space Complexity
When integrating Airflow with PagerDuty and Slack, it's important to understand how the time to send alerts grows as the number of tasks or alerts increases.
We want to know how the system handles more alerts and notifications as workload grows.
Analyze the time complexity of the following Airflow task that sends alerts to PagerDuty and Slack for multiple failed tasks.
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime
def send_alerts(failed_tasks):
for task in failed_tasks:
send_to_pagerduty(task)
send_to_slack(task)
def send_to_pagerduty(task):
pass # sends alert to PagerDuty
def send_to_slack(task):
pass # sends alert to Slack
with DAG('alert_dag', start_date=datetime(2024,1,1)) as dag:
alert = PythonOperator(
task_id='send_alerts',
python_callable=send_alerts,
op_kwargs={'failed_tasks': ['task1', 'task2', 'task3']}
)
This code loops over failed tasks and sends alerts to PagerDuty and Slack for each one.
Look at what repeats in the code:
- Primary operation: Looping over each failed task to send alerts.
- How many times: Once for each failed task in the list.
As the number of failed tasks grows, the number of alert sends grows the same way.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 20 (10 PagerDuty + 10 Slack sends) |
| 100 | 200 (100 PagerDuty + 100 Slack sends) |
| 1000 | 2000 (1000 PagerDuty + 1000 Slack sends) |
Pattern observation: The total alert sends grow directly with the number of failed tasks.
Time Complexity: O(n)
This means the time to send alerts grows linearly with the number of failed tasks.
[X] Wrong: "Sending alerts to PagerDuty and Slack happens all at once, so time stays the same no matter how many tasks fail."
[OK] Correct: Each alert requires a separate send operation, so more failed tasks mean more sends and more time.
Understanding how alert sending scales helps you design reliable workflows that handle many failures gracefully.
"What if we batch all failed tasks into a single alert message instead of sending one per task? How would the time complexity change?"