0
0
HLDsystem_design~10 mins

REST API best practices in HLD - Scalability & System Analysis

Choose your learning style9 modes available
Scalability Analysis - REST API best practices
Growth Table: REST API Scaling from 100 to 100M Users
UsersTraffic CharacteristicsAPI Server LoadDatabase LoadNetworkOther Changes
100 usersLow requests/sec (~10-50)Single server handles easilySingle DB instance, low QPSMinimal bandwidthBasic logging, no caching needed
10,000 usersModerate requests/sec (~1,000-5,000)Multiple API servers behind load balancerDB handles ~1,000 QPS, may need read replicasModerate bandwidth, consider CDN for staticIntroduce caching (Redis), rate limiting
1,000,000 usersHigh requests/sec (~100,000+)Horizontal scaling of API servers, autoscalingDB bottleneck likely, sharding or partitioning neededHigh bandwidth, CDN critical for static and some API responsesAdvanced caching, API gateway, throttling, monitoring
100,000,000 usersVery high requests/sec (millions)Global distributed API servers, multi-region load balancingMultiple DB clusters, geo-distributed, complex shardingVery high bandwidth, multi-CDN strategyMicroservices, event-driven, circuit breakers, extensive monitoring
First Bottleneck

At small to medium scale, the database is the first bottleneck. It struggles to handle increasing query volume and complex joins. API servers and network usually handle load better initially. Without caching, DB load grows linearly with users.

Scaling Solutions
  • Horizontal scaling: Add more API servers behind load balancers to handle more concurrent requests.
  • Caching: Use Redis or Memcached to cache frequent API responses and reduce DB load.
  • Database read replicas: Offload read queries to replicas to reduce primary DB load.
  • Sharding/Partitioning: Split database by user ID or region to distribute load.
  • CDN: Cache static assets and some API responses close to users to reduce latency and bandwidth.
  • API Gateway: Manage rate limiting, authentication, and routing efficiently.
  • Monitoring and throttling: Detect and control traffic spikes to protect backend.
Back-of-Envelope Cost Analysis
  • At 10,000 users, expect ~1,000 QPS. One DB instance can handle ~5,000 QPS, so still okay.
  • At 1M users, ~100,000 QPS likely. Need ~20 DB replicas or sharded clusters.
  • Bandwidth: 1 Gbps = 125 MB/s. For 100,000 QPS with 1 KB response, need ~100 MB/s bandwidth.
  • API servers: Each handles ~5,000 concurrent connections. For 100,000 QPS, need ~20 servers.
  • Caching reduces DB QPS by 50-90%, lowering infrastructure cost.
Interview Tip

Structure your scalability discussion by first identifying the main components: API servers, database, network. Then estimate load at different user scales. Identify the first bottleneck (usually DB). Propose targeted solutions like caching, read replicas, and sharding. Discuss trade-offs and monitoring. Keep explanations simple and focused.

Self Check

Question: Your database handles 1000 QPS. Traffic grows 10x. What do you do first?

Answer: Introduce caching to reduce DB queries and add read replicas to distribute read load. This relieves DB pressure before considering sharding or more complex solutions.

Key Result
The database is the first bottleneck as user traffic grows; applying caching and read replicas early effectively delays costly sharding and complex scaling.