| Users/Data Size | Horizontal Partitioning | Vertical Partitioning |
|---|---|---|
| 100 users / small data | Single database instance; simple queries | Single database; all columns together |
| 10,000 users / medium data | Split data rows across shards by user ID; reduces row size per shard | Split tables by columns (e.g., user info separate from logs); reduces column size per query |
| 1 million users / large data | Many shards; each handles subset of users; easier to scale writes | Tables split by feature sets; some queries need joins across vertical partitions |
| 100 million users / huge data | Hundreds of shards; complex routing; cross-shard queries costly | Multiple vertical partitions across clusters; increased join complexity; possible duplication |
Horizontal vs vertical partitioning in HLD - Scaling Approaches Compared
At small scale, the database handles all data easily. As users grow, the first bottleneck is usually the database storage and query performance.
Horizontal partitioning bottleneck: Single shard storage and write throughput limits.
Vertical partitioning bottleneck: Complex joins across partitions slow queries.
- Horizontal Partitioning: Add more shards to distribute rows; use consistent hashing or range-based sharding; implement routing layer to direct queries.
- Vertical Partitioning: Separate tables by feature or column groups; optimize queries to avoid unnecessary joins; replicate frequently joined partitions.
- Combine both: Use horizontal partitioning within vertical partitions for very large scale.
- Use caching layers to reduce database load.
- Use read replicas to scale read-heavy workloads.
Assuming 1 million users, each generating 10 requests per second:
- Total requests: 10 million QPS (queries per second).
- Single DB instance handles ~5,000 QPS → Need ~2,000 shards for horizontal partitioning.
- Storage per user: 1 MB → Total 1 TB data; split across shards or vertical tables.
- Network bandwidth: 1 Gbps = 125 MB/s; large data transfers require multiple network interfaces or data centers.
Start by explaining the difference: horizontal splits rows, vertical splits columns.
Discuss pros and cons of each with examples.
Identify bottlenecks based on data size and query patterns.
Suggest scaling steps logically: start with vertical if columns are large, horizontal if rows grow fast.
Mention combining both for complex systems.
Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?
Answer: Implement horizontal partitioning (sharding) to distribute load across multiple database instances, because a single instance cannot handle 10x the queries.