Bird
0
0
LLDsystem_design~10 mins

Emergency handling in LLD - Scalability & System Analysis

Choose your learning style9 modes available
Scalability Analysis - Emergency handling
Growth Table: Emergency Handling System
Users/Events100 Users10,000 Users1,000,000 Users100,000,000 Users
Event Volume~10 events/min~1,000 events/min~100,000 events/min~10,000,000 events/min
System ComponentsSingle server, simple alertingMultiple servers, basic load balancingDistributed servers, advanced routingGlobal distributed system, multi-region failover
Database LoadLow, single instanceModerate, read replicasHigh, sharded databaseVery high, multi-shard, geo-distributed DB
Alerting LatencySecondsSeconds to sub-secondSub-secondMilliseconds
Storage NeedsGBsTBsPetabytesExabytes
Network BandwidthLowModerateHighVery High
First Bottleneck

The database is the first bottleneck as event volume grows. Emergency handling systems require fast writes and reads for alerts and logs. At around 10,000 users generating thousands of events per minute, a single database instance struggles with write throughput and query latency.

Scaling Solutions
  • Horizontal Scaling: Add more application servers behind load balancers to handle increased event processing.
  • Database Read Replicas: Use replicas to offload read queries and reduce latency.
  • Sharding: Partition the database by event type or region to distribute load.
  • Caching: Cache frequent queries and alert statuses in fast in-memory stores like Redis.
  • Message Queues: Use queues to buffer incoming events and smooth spikes in traffic.
  • CDN and Edge Computing: For alert delivery (e.g., notifications), use CDNs and edge nodes to reduce latency globally.
  • Multi-region Deployment: Deploy system components in multiple regions for fault tolerance and disaster recovery.
Back-of-Envelope Cost Analysis
  • At 10,000 users generating ~1,000 events/min (~17 events/sec), the system needs to handle ~17 writes/sec plus reads.
  • Database write capacity: A single PostgreSQL instance can handle ~5,000 QPS, so write load is manageable initially.
  • Storage: Assuming 1 KB per event, 1,000 events/min = ~1.4 MB/min = ~2 GB/month.
  • Network bandwidth: 1,000 events/min * 1 KB = ~17 KB/sec, very low at this scale.
  • At 1 million users (~100,000 events/min), write load is ~1,666 QPS, requiring sharded DB and caching.
  • Bandwidth and storage scale accordingly, requiring distributed storage and efficient data retention policies.
Interview Tip

Start by clarifying the expected event volume and latency requirements. Discuss the data flow from event ingestion to alerting. Identify the database as the likely bottleneck early. Propose incremental scaling steps: caching, read replicas, sharding, and multi-region deployment. Emphasize fault tolerance and disaster recovery in emergency systems.

Self Check

Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?

Answer: Add read replicas to offload read queries and implement caching to reduce database load. If writes are the bottleneck, consider sharding the database to distribute write load across multiple instances.

Key Result
Emergency handling systems first hit database bottlenecks as event volume grows; scaling requires caching, read replicas, sharding, and multi-region deployment to maintain low latency and high availability.