+----------------+ +---------------------+ +---------------------+
| Microservices | ----> | Metrics/Event | ----> | Alert Evaluation |
| (1000 services)| | Collector | | Engine |
+----------------+ +---------------------+ +----------+----------+
|
v
+-----------------------------+
| Alert Aggregation & |
| Deduplication Module |
+-------------+---------------+
|
v
+-----------------------------+
| Notification Service |
| (Email, SMS, Dashboard) |
+-------------+---------------+
|
v
+-----------------------------+
| Alert Storage & History DB |
+-----------------------------+
User Interface <-------------------------------------------------------------+
(Manage alerts, acknowledge, escalate)Components
Metrics/Event Collector
Prometheus exporters, Fluentd, or custom agents
Collect metrics and events from microservices for alert evaluation
Alert Evaluation Engine
Rule engine or custom service using PromQL or similar
Evaluate incoming metrics/events against alert rules and thresholds
Alert Aggregation & Deduplication Module
In-memory cache or stream processor like Apache Kafka Streams
Group similar alerts to reduce noise and avoid alert storms
Notification Service
SMTP servers, Twilio SMS API, WebSocket or REST API for dashboard
Send alerts to users via email, SMS, and update dashboards
Alert Storage & History DB
PostgreSQL or Cassandra
Store alert records, status, acknowledgements, and escalation history
User Interface
React or Angular web app
Allow users to view, acknowledge, and manage alerts
Authentication & Authorization
OAuth2 or JWT
Secure access to alert management UI and APIs
Request Flow
1. 1. Microservices emit metrics and events continuously.
2. 2. Metrics/Event Collector gathers data and forwards to Alert Evaluation Engine.
3. 3. Alert Evaluation Engine checks data against configured alert rules.
4. 4. When a rule triggers, an alert event is created and sent to Aggregation Module.
5. 5. Aggregation Module groups similar alerts and suppresses duplicates.
6. 6. Aggregated alerts are sent to Notification Service for delivery.
7. 7. Notification Service sends alerts via email, SMS, and updates dashboard.
8. 8. Alert details and status are saved in Alert Storage DB.
9. 9. Users access UI to view alerts, acknowledge, or escalate if needed.
Database Schema
Entities:
- Microservice (id, name, owner)
- AlertRule (id, microservice_id, metric_name, threshold, severity, enabled)
- Alert (id, alert_rule_id, timestamp, status [triggered, acknowledged, resolved], message)
- Notification (id, alert_id, channel [email, sms, dashboard], status, sent_timestamp)
- User (id, name, email, phone, role)
- AlertAcknowledgement (id, alert_id, user_id, timestamp)
Relationships:
- Microservice 1:N AlertRule
- AlertRule 1:N Alert
- Alert 1:N Notification
- Alert 1:1 AlertAcknowledgement (optional)
- User 1:N AlertAcknowledgement