0
0
HLDsystem_design~10 mins

Why choosing the right storage matters in HLD - Scalability Evidence

Choose your learning style9 modes available
Scalability Analysis - Why choosing the right storage matters
Growth Table: Storage Needs at Different Scales
UsersData VolumeRequest RateStorage Type Impact
100 usersMBs to GBsLow (few QPS)Simple local disk or single DB instance works
10,000 usersGBs to TBsHundreds to thousands QPSNeed scalable DB, caching, and faster storage (SSD)
1,000,000 usersTBs to PBsThousands to tens of thousands QPSDistributed storage, sharding, and tiered storage needed
100,000,000 usersPBs+Hundreds of thousands QPS+Highly distributed, multi-region storage with CDN and archival
First Bottleneck: Storage Performance and Scalability

At small scale, storage is simple and fast enough. As users and data grow, the storage system becomes the first bottleneck because:

  • Disk I/O limits read/write speed.
  • Single database instances can't handle high query rates.
  • Storage capacity limits total data size.
  • Latency increases with data size and complexity.

Choosing the right storage type early prevents slowdowns and outages as traffic grows.

Scaling Solutions for Storage
  • Vertical scaling: Upgrade to faster disks (SSD, NVMe) and more RAM for caching.
  • Horizontal scaling: Use database sharding to split data across servers.
  • Caching: Add in-memory caches (Redis, Memcached) to reduce DB load.
  • Tiered storage: Store hot data on fast storage, cold data on cheaper slower disks.
  • Distributed storage: Use systems like Cassandra, HDFS for large scale data.
  • Content Delivery Network (CDN): Offload static content to edge servers to reduce storage and bandwidth load.
Back-of-Envelope Cost Analysis

Example for 1 million users:

  • Requests per second (QPS): ~10,000 (assuming 10 requests/user/hour)
  • Storage needed: ~10 TB (assuming 10 KB data/user)
  • Bandwidth: ~100 MB/s (assuming 10 KB/request)
  • Single DB instance max QPS: ~5,000 → need at least 2 DB servers or sharding
  • Disk IOPS: SSD can handle ~100,000 IOPS, enough for this scale
Interview Tip: Structuring Your Scalability Discussion

When asked about storage scalability:

  1. Start by estimating data size and request rates at different user scales.
  2. Identify the storage bottleneck (capacity, IOPS, latency).
  3. Discuss vertical and horizontal scaling options.
  4. Mention caching and tiered storage to optimize performance and cost.
  5. Explain trade-offs and when to use distributed storage or CDNs.
Self Check Question

Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first and why?

Key Result
Choosing the right storage early is critical because storage performance and capacity become the first bottleneck as user count and data volume grow. Proper scaling strategies like sharding, caching, and tiered storage prevent slowdowns and outages.