HLDsystem_design~10 mins

REST API design for systems in HLD - Scalability & System Analysis

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Scalability Analysis - REST API design for systems

Growth Table: REST API Design for Systems

Users/Traffic	What Changes?
100 users	Single API server handles requests; simple database; no caching needed; low latency.
10,000 users	Need load balancer; multiple API servers; database read replicas; introduce caching layer (e.g., Redis); rate limiting.
1,000,000 users	API servers scaled horizontally with auto-scaling; database sharding; CDN for static content; advanced caching; API gateway for routing and security; asynchronous processing for heavy tasks.
100,000,000 users	Global distributed API servers; multi-region database clusters; aggressive caching and edge computing; microservices split; strict rate limiting and quota management; event-driven architecture for scalability.

First Bottleneck

At small to medium scale, the database is the first bottleneck. It struggles to handle high query rates and concurrent connections from API servers. This causes increased latency and potential downtime.

Scaling Solutions

Database: Use read replicas to distribute read load; implement connection pooling; shard data by user or region.
API Servers: Scale horizontally behind a load balancer; use stateless design for easy scaling.
Caching: Add in-memory caches (Redis/Memcached) to reduce database hits for frequent queries.
CDN: Serve static content and cache API responses at edge locations to reduce latency.
API Gateway: Manage routing, authentication, rate limiting, and monitoring centrally.
Asynchronous Processing: Offload heavy or long-running tasks to background workers or message queues.

Back-of-Envelope Cost Analysis

At 10,000 users, expect ~1000 QPS (requests per second) assuming 1 request per user per 10 seconds.
Database needs to handle ~1000 QPS; a single PostgreSQL instance can handle up to ~5000 QPS, so read replicas help.
API servers: each can handle ~2000 concurrent connections; 3-5 servers recommended for redundancy and load.
Bandwidth: 1 Gbps network (~125 MB/s) sufficient for typical JSON payloads under 1 KB at 1000 QPS.
Storage: depends on data retention; for 1 million users, expect tens to hundreds of GBs of data monthly.

Interview Tip

Start by clarifying API usage patterns and expected traffic. Identify the main components: API servers, database, caching, and network. Discuss bottlenecks in order: database first, then servers, then network. Propose scaling solutions step-by-step with reasons. Mention trade-offs like consistency vs availability. Keep answers structured and focused.

Self Check

Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?

Answer: Add read replicas to distribute read queries and reduce load on the primary database. Also, implement caching to reduce database hits. This addresses the database bottleneck before scaling API servers.

Key Result

The database is the first bottleneck as traffic grows; scaling it with read replicas, caching, and sharding is key before horizontally scaling API servers and adding CDN or microservices.