0
0
LLDsystem_design~10 mins

Availability checking in LLD - Scalability & System Analysis

Choose your learning style9 modes available
Scalability Analysis - Availability checking
Growth Table: Availability Checking System
UsersRequests per SecondData StoredLatency RequirementsSystem Changes
100~10-50Small (KBs)Low (seconds)Single server, simple DB
10,000~1,000MBsLow (seconds)Load balancer, caching, DB indexing
1,000,000~100,000GBsVery low (milliseconds)Horizontal scaling, DB sharding, CDN, async processing
100,000,000~10,000,000TBsVery low (milliseconds)Multi-region deployment, advanced caching, microservices
First Bottleneck

At small scale, the database is the first bottleneck because it handles all availability check requests and stores status data. As traffic grows, the DB CPU and I/O limits are reached first, causing slow responses.

Scaling Solutions
  • Database Read Replicas: Offload read queries to replicas to reduce load on primary DB.
  • Caching: Use in-memory caches (like Redis) to store recent availability results and reduce DB hits.
  • Horizontal Scaling: Add more application servers behind a load balancer to handle more requests.
  • Sharding: Partition database by user or region to distribute load.
  • Asynchronous Processing: Use queues to handle availability checks in background, improving responsiveness.
  • CDN: Cache static availability data closer to users to reduce latency.
Back-of-Envelope Cost Analysis

For 1 million users with 100k requests/sec:

  • Database: Needs to handle ~100k QPS, requiring sharding and replicas.
  • Cache: Should handle 100k+ ops/sec, requiring Redis cluster.
  • Bandwidth: Assuming 1 KB per request, ~100 MB/s bandwidth needed.
  • Storage: GBs to TBs depending on data retention and history.
Interview Tip

Start by defining the scale and requirements. Identify the first bottleneck clearly. Discuss scaling solutions step-by-step, focusing on database and caching. Mention trade-offs and latency impact. Use real numbers to support your reasoning.

Self Check Question

Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first and why?

Key Result
The database is the first bottleneck in availability checking systems as traffic grows; scaling requires caching, read replicas, and horizontal scaling to maintain low latency and high availability.