HLDsystem_design~10 mins

REST API best practices in HLD - Scalability & System Analysis

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Scalability Analysis - REST API best practices

Growth Table: REST API Scaling from 100 to 100M Users

Users	Traffic Characteristics	API Server Load	Database Load	Network	Other Changes
100 users	Low requests/sec (~10-50)	Single server handles easily	Single DB instance, low QPS	Minimal bandwidth	Basic logging, no caching needed
10,000 users	Moderate requests/sec (~1,000-5,000)	Multiple API servers behind load balancer	DB handles ~1,000 QPS, may need read replicas	Moderate bandwidth, consider CDN for static	Introduce caching (Redis), rate limiting
1,000,000 users	High requests/sec (~100,000+)	Horizontal scaling of API servers, autoscaling	DB bottleneck likely, sharding or partitioning needed	High bandwidth, CDN critical for static and some API responses	Advanced caching, API gateway, throttling, monitoring
100,000,000 users	Very high requests/sec (millions)	Global distributed API servers, multi-region load balancing	Multiple DB clusters, geo-distributed, complex sharding	Very high bandwidth, multi-CDN strategy	Microservices, event-driven, circuit breakers, extensive monitoring

First Bottleneck

At small to medium scale, the database is the first bottleneck. It struggles to handle increasing query volume and complex joins. API servers and network usually handle load better initially. Without caching, DB load grows linearly with users.

Scaling Solutions

Horizontal scaling: Add more API servers behind load balancers to handle more concurrent requests.
Caching: Use Redis or Memcached to cache frequent API responses and reduce DB load.
Database read replicas: Offload read queries to replicas to reduce primary DB load.
Sharding/Partitioning: Split database by user ID or region to distribute load.
CDN: Cache static assets and some API responses close to users to reduce latency and bandwidth.
API Gateway: Manage rate limiting, authentication, and routing efficiently.
Monitoring and throttling: Detect and control traffic spikes to protect backend.

Back-of-Envelope Cost Analysis

At 10,000 users, expect ~1,000 QPS. One DB instance can handle ~5,000 QPS, so still okay.
At 1M users, ~100,000 QPS likely. Need ~20 DB replicas or sharded clusters.
Bandwidth: 1 Gbps = 125 MB/s. For 100,000 QPS with 1 KB response, need ~100 MB/s bandwidth.
API servers: Each handles ~5,000 concurrent connections. For 100,000 QPS, need ~20 servers.
Caching reduces DB QPS by 50-90%, lowering infrastructure cost.

Interview Tip

Structure your scalability discussion by first identifying the main components: API servers, database, network. Then estimate load at different user scales. Identify the first bottleneck (usually DB). Propose targeted solutions like caching, read replicas, and sharding. Discuss trade-offs and monitoring. Keep explanations simple and focused.

Self Check

Question: Your database handles 1000 QPS. Traffic grows 10x. What do you do first?

Answer: Introduce caching to reduce DB queries and add read replicas to distribute read load. This relieves DB pressure before considering sharding or more complex solutions.

Key Result

The database is the first bottleneck as user traffic grows; applying caching and read replicas early effectively delays costly sharding and complex scaling.