0
0
Kubernetesdevops~10 mins

Alerting with Prometheus Alertmanager in Kubernetes - Step-by-Step Execution

Choose your learning style9 modes available
Process Flow - Alerting with Prometheus Alertmanager
Prometheus scrapes metrics
Prometheus evaluates alert rules
Alert fires if condition met
Alert sent to Alertmanager
Alertmanager groups and deduplicates alerts
Alertmanager sends notifications
Notification received by user or system
Prometheus collects metrics, checks alert rules, sends alerts to Alertmanager, which groups alerts and sends notifications.
Execution Sample
Kubernetes
groups:
- name: example
  rules:
  - alert: HighCpuUsage
    expr: cpu_usage > 80
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "CPU usage is above 80%"
This alert rule fires if CPU usage is above 80% for 5 minutes, labeling it as critical.
Process Table
StepPrometheus MetricAlert Rule ConditionCondition ResultAlertmanager ActionNotification Sent
1cpu_usage=75cpu_usage > 80 for 5mFalseNo alert sentNo
2cpu_usage=85 (for 3m)cpu_usage > 80 for 5mFalse (time not reached)No alert sentNo
3cpu_usage=85 (for 5m)cpu_usage > 80 for 5mTrueAlert fired and sentYes
4cpu_usage=85 (continued)Alert activeTrueAlertmanager groups alertNotification sent
5cpu_usage=70cpu_usage > 80 for 5mFalseAlert resolvedResolved notification sent
💡 Alert resolved when cpu_usage drops below threshold; alert lifecycle ends.
Status Tracker
VariableStartAfter Step 1After Step 2After Step 3After Step 4After Step 5
cpu_usage707585858570
Alert StateInactiveInactiveInactiveFiringFiringResolved
Notification SentNoNoNoYesYesYes (resolved)
Key Moments - 3 Insights
Why doesn't the alert fire immediately when cpu_usage first goes above 80?
Because the alert rule requires cpu_usage > 80 for 5 continuous minutes (see Step 2 and 3 in execution_table). The condition must hold for the full duration before firing.
What does Alertmanager do when it receives multiple alerts for the same issue?
Alertmanager groups and deduplicates alerts to avoid spamming notifications, as shown in Step 4 where it groups the alert before sending notification.
How does the alert get resolved?
When the metric drops below the threshold (cpu_usage < 80), Prometheus marks the alert as resolved and Alertmanager sends a resolved notification (Step 5).
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, at which step does the alert first fire?
AStep 1
BStep 3
CStep 2
DStep 5
💡 Hint
Check the 'Condition Result' column to see when it becomes True for the first time.
According to variable_tracker, what is the Alert State after Step 4?
AFiring
BInactive
CResolved
DUnknown
💡 Hint
Look at the 'Alert State' row under 'After Step 4' column.
If the cpu_usage stayed above 80 for only 3 minutes, what would happen to the alert?
AAlert would fire immediately
BAlert would fire after 3 minutes
CAlert would never fire
DAlertmanager would send resolved notification
💡 Hint
Refer to Step 2 in execution_table where condition is false because time threshold is not met.
Concept Snapshot
Prometheus Alerting:
- Define alert rules with conditions and duration
- Prometheus evaluates rules regularly
- Alerts fire only if condition holds for specified time
- Alerts sent to Alertmanager
- Alertmanager groups alerts and sends notifications
- Alerts resolve when conditions clear
Full Transcript
Prometheus collects metrics and checks alert rules. If a metric exceeds a threshold for a set time, an alert fires. This alert is sent to Alertmanager, which groups similar alerts to avoid duplicates and sends notifications to users or systems. When the metric returns to normal, the alert resolves and Alertmanager sends a resolved notification. This process helps monitor system health and notify teams only when needed.