HLDsystem_design~10 mins

Why distributed patterns solve common challenges in HLD - Scalability Evidence

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Scalability Analysis - Why distributed patterns solve common challenges

Growth Table: What Changes as Users and Data Grow

Scale	Users	Data Volume	Traffic	System Behavior
Small	100 users	Low (MBs)	Low (few QPS)	Single server handles all; simple design
Medium	10,000 users	Moderate (GBs)	Moderate (hundreds QPS)	Server CPU/memory stressed; DB load increases
Large	1,000,000 users	High (TBs)	High (thousands QPS)	Single server insufficient; DB bottleneck; latency rises
Very Large	100,000,000 users	Very High (PBs)	Very High (hundreds of thousands QPS)	Need distributed systems; partitioning; multi-region

First Bottleneck: Why Single Systems Break

At small scale, a single server and database can handle all requests. As users grow, the database becomes the first bottleneck because it can only process a limited number of queries per second (usually up to 5,000-10,000 QPS for a single instance). CPU and memory on the application server also get stressed as traffic increases. Network bandwidth and storage limits appear at very large scale.

Distributed patterns solve these by spreading load across many machines, avoiding single points of failure and scaling horizontally.

Scaling Solutions Using Distributed Patterns

Horizontal Scaling: Add more servers to share load, improving capacity and fault tolerance.
Load Balancing: Distribute incoming requests evenly to prevent overload on any one server.
Database Replication: Use read replicas to handle read-heavy traffic, reducing load on primary DB.
Sharding: Split data across multiple databases by key ranges or hashes to handle large data volumes.
Caching: Store frequent data in fast memory (e.g., Redis) to reduce database hits.
Message Queues: Decouple components and smooth traffic spikes by asynchronous processing.
CDNs: Cache static content closer to users to reduce bandwidth and latency.
Multi-region Deployment: Place servers near users to reduce latency and improve availability.

Back-of-Envelope Cost Analysis

Example for 1 million users with 1 request per second each:

Requests per second: 1,000,000 QPS (too high for single DB)
Database capacity: Single DB ~10,000 QPS -> Need ~100 DB shards or replicas
Network bandwidth: 1 Gbps = 125 MB/s; 1M QPS with 1 KB payload = ~1 GB/s -> Need multiple network links
Storage: TBs of data requiring distributed storage solutions
Servers: Hundreds of app servers behind load balancers

Interview Tip: Structuring Scalability Discussion

Start by describing current system limits at small scale. Identify the first bottleneck as traffic grows. Explain why that component breaks (e.g., DB QPS limit). Then propose distributed solutions step-by-step, explaining how each solves a specific problem. Use real numbers to justify your choices. Finally, mention trade-offs and monitoring needs.

Self-Check Question

Question: Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first and why?

Answer: The first step is to add read replicas to offload read queries from the primary database. This reduces load and increases read capacity. If writes also increase, consider sharding data or scaling the database vertically or horizontally.

Key Result

Distributed patterns help systems grow by spreading load and data across many machines, avoiding single points of failure and bottlenecks as user count and traffic increase.