| Users / Data Size | Normalized DB | Denormalized DB |
|---|---|---|
| 100 users | Simple joins, low redundancy, easy updates | Fast reads, some data duplication, simple queries |
| 10,000 users | More joins, moderate query complexity, good data integrity | Faster read queries, larger storage, update complexity grows |
| 1 million users | Joins become expensive, slower queries, DB CPU load rises | Read-heavy workloads scale well, write/update latency increases |
| 100 million users | Joins cause major slowdowns, DB scaling hard without sharding | Massive storage needs, complex update logic, caching essential |
Database normalization vs denormalization in HLD - Scaling Approaches Compared
In a normalized database, the first bottleneck is the join operations on large tables. As data grows, joins consume CPU and slow queries.
In a denormalized database, the bottleneck is write/update operations. Because data is duplicated, updates must modify multiple places, increasing latency and risk of inconsistency.
- Normalized DB: Use read replicas to offload read queries, add indexes, and consider sharding large tables to reduce join costs.
- Denormalized DB: Use caching layers (like Redis) to speed reads, implement batch updates to reduce write overhead, and use event-driven mechanisms to keep data consistent.
- For both, horizontal scaling of database servers and application servers helps handle increased load.
- Use CDNs for static content to reduce database load indirectly.
Assuming 1 million users with 10 requests per second (RPS):
- Normalized DB: 10 RPS with complex joins may require multiple read replicas (5-10) to handle CPU load.
- Denormalized DB: 10 RPS mostly reads, but writes may be slower due to duplication; write throughput may need batching.
- Storage: Denormalized DB uses 20-50% more storage due to duplicated data.
- Bandwidth: Both require similar network bandwidth; denormalized may send more data per query.
Start by explaining the trade-offs: normalization improves data integrity and reduces storage but can slow reads due to joins. Denormalization speeds up reads but complicates writes and increases storage.
Discuss workload patterns (read-heavy vs write-heavy) to decide which approach fits best.
Outline scaling strategies for each and mention how caching and sharding help.
Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?
Answer: Identify if the bottleneck is read or write. For normalized DB, add read replicas and optimize indexes to handle more reads. For denormalized DB, implement caching and batch updates to reduce write load.