HLDsystem_design~10 mins

Throughput, latency, and availability in HLD - Scalability & System Analysis

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Scalability Analysis - Throughput, latency, and availability

Growth Table: Throughput, Latency, and Availability

Scale	Throughput (requests/sec)	Latency (ms)	Availability (%)	What Changes?
100 users	~50	~50	99.9	Single server handles requests easily; low latency; simple setup
10,000 users	~5,000	~100	99.95	Need load balancer; some caching; latency slightly increases
1,000,000 users	~500,000	~200	99.99	Multiple servers; database replicas; CDN; latency affected by network
100,000,000 users	~50,000,000	~300+	99.999	Global distribution; sharding; advanced caching; complex failover

First Bottleneck

At small scale, the application server CPU and memory can handle throughput and latency well.

As users grow to thousands, the database becomes the first bottleneck because it handles many read/write operations.

At millions of users, network bandwidth and latency limit performance, affecting availability.

Scaling Solutions

Horizontal scaling: Add more servers behind load balancers to increase throughput and reduce latency.
Caching: Use in-memory caches (like Redis) to reduce database load and improve latency.
Database replication: Use read replicas to spread read traffic and improve availability.
Sharding: Split database by user or data to handle large scale writes and reads.
Content Delivery Network (CDN): Cache static content closer to users to reduce latency globally.
Failover and redundancy: Use multiple data centers and automatic failover to improve availability.

Back-of-Envelope Cost Analysis

At 1,000 users: ~50 requests/sec, easily handled by 1 server with 1 Gbps network.
At 1 million users: ~500,000 requests/sec, need ~100 servers (assuming 5,000 req/sec/server).
Database: Single instance handles ~10,000 QPS; need replicas and sharding beyond that.
Bandwidth: 1 Gbps = 125 MB/s; high throughput requires multiple network interfaces or data centers.
Latency: Network and disk I/O dominate; caching reduces disk reads and improves response times.

Interview Tip

Start by defining throughput, latency, and availability in simple terms.

Explain how each metric affects user experience and system design.

Discuss bottlenecks at different scales and propose targeted solutions.

Use real numbers to show understanding of system limits and scaling techniques.

Self Check

Your database handles 1,000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?

Answer: Add read replicas to distribute read load and implement caching to reduce database hits. Consider sharding if writes also increase significantly.

Key Result

Throughput, latency, and availability scale differently; databases often bottleneck first at medium scale, solved by replication and caching; network and global distribution become critical at very large scale.