HLDsystem_design~10 mins

Database normalization vs denormalization in HLD - Scaling Approaches Compared

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Scalability Analysis - Database normalization vs denormalization

Growth Table: Database Normalization vs Denormalization

Users / Data Size	Normalized DB	Denormalized DB
100 users	Simple joins, low redundancy, easy updates	Fast reads, some data duplication, simple queries
10,000 users	More joins, moderate query complexity, good data integrity	Faster read queries, larger storage, update complexity grows
1 million users	Joins become expensive, slower queries, DB CPU load rises	Read-heavy workloads scale well, write/update latency increases
100 million users	Joins cause major slowdowns, DB scaling hard without sharding	Massive storage needs, complex update logic, caching essential

First Bottleneck

In a normalized database, the first bottleneck is the join operations on large tables. As data grows, joins consume CPU and slow queries.

In a denormalized database, the bottleneck is write/update operations. Because data is duplicated, updates must modify multiple places, increasing latency and risk of inconsistency.

Scaling Solutions

Normalized DB: Use read replicas to offload read queries, add indexes, and consider sharding large tables to reduce join costs.
Denormalized DB: Use caching layers (like Redis) to speed reads, implement batch updates to reduce write overhead, and use event-driven mechanisms to keep data consistent.
For both, horizontal scaling of database servers and application servers helps handle increased load.
Use CDNs for static content to reduce database load indirectly.

Back-of-Envelope Cost Analysis

Assuming 1 million users with 10 requests per second (RPS):

Normalized DB: 10 RPS with complex joins may require multiple read replicas (5-10) to handle CPU load.
Denormalized DB: 10 RPS mostly reads, but writes may be slower due to duplication; write throughput may need batching.
Storage: Denormalized DB uses 20-50% more storage due to duplicated data.
Bandwidth: Both require similar network bandwidth; denormalized may send more data per query.

Interview Tip

Start by explaining the trade-offs: normalization improves data integrity and reduces storage but can slow reads due to joins. Denormalization speeds up reads but complicates writes and increases storage.

Discuss workload patterns (read-heavy vs write-heavy) to decide which approach fits best.

Outline scaling strategies for each and mention how caching and sharding help.

Self Check

Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?

Answer: Identify if the bottleneck is read or write. For normalized DB, add read replicas and optimize indexes to handle more reads. For denormalized DB, implement caching and batch updates to reduce write load.

Key Result

Normalized databases face join-related CPU bottlenecks as data grows, while denormalized databases face write/update complexity and storage overhead. Scaling requires read replicas and sharding for normalized DBs, and caching plus batch updates for denormalized DBs.