0
0
HLDsystem_design~10 mins

Database normalization vs denormalization in HLD - Scaling Approaches Compared

Choose your learning style9 modes available
Scalability Analysis - Database normalization vs denormalization
Growth Table: Database Normalization vs Denormalization
Users / Data SizeNormalized DBDenormalized DB
100 usersSimple joins, low redundancy, easy updatesFast reads, some data duplication, simple queries
10,000 usersMore joins, moderate query complexity, good data integrityFaster read queries, larger storage, update complexity grows
1 million usersJoins become expensive, slower queries, DB CPU load risesRead-heavy workloads scale well, write/update latency increases
100 million usersJoins cause major slowdowns, DB scaling hard without shardingMassive storage needs, complex update logic, caching essential
First Bottleneck

In a normalized database, the first bottleneck is the join operations on large tables. As data grows, joins consume CPU and slow queries.

In a denormalized database, the bottleneck is write/update operations. Because data is duplicated, updates must modify multiple places, increasing latency and risk of inconsistency.

Scaling Solutions
  • Normalized DB: Use read replicas to offload read queries, add indexes, and consider sharding large tables to reduce join costs.
  • Denormalized DB: Use caching layers (like Redis) to speed reads, implement batch updates to reduce write overhead, and use event-driven mechanisms to keep data consistent.
  • For both, horizontal scaling of database servers and application servers helps handle increased load.
  • Use CDNs for static content to reduce database load indirectly.
Back-of-Envelope Cost Analysis

Assuming 1 million users with 10 requests per second (RPS):

  • Normalized DB: 10 RPS with complex joins may require multiple read replicas (5-10) to handle CPU load.
  • Denormalized DB: 10 RPS mostly reads, but writes may be slower due to duplication; write throughput may need batching.
  • Storage: Denormalized DB uses 20-50% more storage due to duplicated data.
  • Bandwidth: Both require similar network bandwidth; denormalized may send more data per query.
Interview Tip

Start by explaining the trade-offs: normalization improves data integrity and reduces storage but can slow reads due to joins. Denormalization speeds up reads but complicates writes and increases storage.

Discuss workload patterns (read-heavy vs write-heavy) to decide which approach fits best.

Outline scaling strategies for each and mention how caching and sharding help.

Self Check

Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?

Answer: Identify if the bottleneck is read or write. For normalized DB, add read replicas and optimize indexes to handle more reads. For denormalized DB, implement caching and batch updates to reduce write load.

Key Result
Normalized databases face join-related CPU bottlenecks as data grows, while denormalized databases face write/update complexity and storage overhead. Scaling requires read replicas and sharding for normalized DBs, and caching plus batch updates for denormalized DBs.