0
0
HLDsystem_design~10 mins

Why database scaling handles data growth in HLD - Scalability Evidence

Choose your learning style9 modes available
Scalability Analysis - Why database scaling handles data growth
Growth Table: Data Growth Impact on Database
Users / Data Size100 Users10K Users1M Users100M Users
Data VolumeSmall (MBs)Medium (GBs)Large (TBs)Very Large (PBs)
Database LoadLow QPS (queries/sec)Moderate QPSHigh QPSVery High QPS
Response TimeFast (ms)Acceptable (tens ms)Slower (hundreds ms)Slow (seconds)
Storage NeedsMinimalGrowingLarge Disk ArraysDistributed Storage Systems
Backup & RecoverySimpleScheduledComplexHighly Automated
First Bottleneck: Database Storage and Query Performance

As data grows, the database storage and query speed become the first limits. Large data slows down searches and updates. Disk space fills up. Indexes become less efficient. This causes slower responses and higher load on the database server.

Scaling Solutions for Database Data Growth
  • Vertical Scaling: Upgrade to bigger servers with more CPU, RAM, and faster disks to handle more data and queries.
  • Read Replicas: Create copies of the database to spread read queries and reduce load on the main database.
  • Sharding: Split data horizontally across multiple database servers by user ID or other keys to distribute storage and queries.
  • Caching: Use in-memory caches like Redis to store frequent query results and reduce database hits.
  • Archival: Move old or less-used data to cheaper storage to keep the main database smaller and faster.
Back-of-Envelope Cost Analysis

Assuming 1M users generate 10 QPS each, total 10M QPS is unrealistic for one DB. A single PostgreSQL instance handles ~5K QPS. So, 10M QPS requires ~2000 DB instances or sharding.

Storage: 1M users with 1GB each = 1PB data. This needs distributed storage solutions.

Bandwidth: 1 Gbps network supports ~125 MB/s. Large data transfers require multiple network interfaces or data centers.

Interview Tip: Structuring Database Scaling Discussion

Start by explaining how data growth affects storage and query speed. Identify the first bottleneck (usually DB storage or query load). Then discuss vertical scaling, read replicas, sharding, and caching as solutions. Mention trade-offs and complexity added by each.

Self Check Question

Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first and why?

Key Result
Database scaling handles data growth by addressing storage limits and query performance bottlenecks through vertical scaling, read replicas, sharding, and caching to maintain fast responses as data and traffic increase.