HLDsystem_design~12 mins

Alerting thresholds in HLD - Architecture Diagram

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

System Overview - Alerting thresholds

This system monitors application and infrastructure metrics to detect issues early. It uses alerting thresholds to trigger notifications when metrics cross defined limits, helping teams respond quickly to problems.

Architecture Diagram

User
  |
  v
Load Balancer
  |
  v
API Gateway
  |
  v
Metrics Collector ---> Cache
  |                    |
  v                    v
Alerting Engine <---- Database
  |
  v
Notification Service
  |
  v
User

Components

User

user

Receives alerts and configures thresholds

Load Balancer

load_balancer

Distributes incoming API requests evenly

API Gateway

api_gateway

Routes requests to Metrics Collector and Alerting Engine

Metrics Collector

service

Collects real-time metrics from monitored systems

Cache

cache

Stores recent metrics for fast access

Database

database

Stores historical metrics and alert configurations

Alerting Engine

service

Evaluates metrics against alerting thresholds

Notification Service

service

Sends alerts to users via email, SMS, or other channels

Request Flow - 9 Hops

User → Load Balancer

Load Balancer → API Gateway

API Gateway → Database

Metrics Collector → Cache

Metrics Collector → Database

Alerting Engine → Cache

Alerting Engine → Database

Alerting Engine → Notification Service

Notification Service → User

Failure Scenario

Component Fails:Database

Impact:New threshold configurations cannot be saved or read; historical metrics unavailable; alerting engine may use stale data

Mitigation:Use database replication and failover; alerting engine relies on cache for recent metrics and thresholds temporarily

Architecture Quiz - 3 Questions

Test your understanding

Which component stores recent metrics for fast access?

ADatabase

BAPI Gateway

CCache

DNotification Service

Design Principle

This architecture uses caching to reduce latency for metric reads and alert evaluation. It separates concerns by using dedicated services for metrics collection, alert evaluation, and notifications. Load balancing and API gateway ensure scalability and routing. The system handles failures by relying on cache and replication to maintain availability.