0
0
HLDsystem_design~10 mins

Storage access patterns in HLD - Scalability & System Analysis

Choose your learning style9 modes available
Scalability Analysis - Storage access patterns
Growth Table: Storage Access Patterns
Users / Data Size100 Users10K Users1M Users100M Users
Read Requests per Second~100-500~10K~1M~100M
Write Requests per Second~50-200~5K~500K~50M
Data SizeGBsTBsPBsExabytes
Access PatternMostly sequential, low concurrencyMixed sequential and random, moderate concurrencyHighly random, high concurrencyHighly random, extreme concurrency, multi-region
Latency SensitivityLow to moderateModerateHighVery high
Storage TypeLocal disks or single DBDistributed storage, caching introducedSharded DBs, SSDs, caching layersMulti-region distributed storage, tiered storage
First Bottleneck

At small scale (100 users), storage access is simple and local disks or single database handle requests well.

At medium scale (10K-1M users), the first bottleneck is the storage I/O throughput and latency. Random access patterns cause disk seek delays and slow queries.

At large scale (100M users), the bottleneck shifts to storage system scalability and network bandwidth between distributed storage nodes.

Scaling Solutions
  • Caching: Use in-memory caches (Redis, Memcached) to reduce read load on storage.
  • Sharding: Split data horizontally across multiple storage nodes to distribute load.
  • Data Partitioning: Organize data by access patterns to optimize sequential reads and reduce random I/O.
  • Tiered Storage: Use fast SSDs for hot data and slower disks for cold data.
  • Replication: Use read replicas to scale read throughput and improve availability.
  • CDN: For static content, use Content Delivery Networks to reduce storage access and latency.
  • Compression and Compaction: Reduce storage size and improve I/O efficiency.
  • Asynchronous Writes: Buffer writes to reduce write latency and batch operations.
Back-of-Envelope Cost Analysis

Assuming 1M users with 1M read and 500K write requests per second:

  • Storage IOPS needed: ~1.5 million per second (reads + writes)
  • Network bandwidth: For 1KB average request size, ~1.5 GB/s (~12 Gbps)
  • Storage capacity: Petabyte scale, depending on data retention
  • Cache size: Tens to hundreds of GBs to hold hot data
  • Number of storage nodes: Hundreds to thousands, depending on node capacity
Interview Tip

When discussing storage access patterns scalability, start by describing the current access pattern (sequential vs random, read/write ratio).

Explain how these patterns affect storage I/O and latency.

Identify the bottleneck clearly (I/O throughput, latency, network).

Then propose targeted solutions like caching, sharding, replication, and tiered storage.

Always connect solutions to the specific bottleneck caused by access patterns.

Self Check

Your database handles 1000 QPS. Traffic grows 10x. What do you do first?

Answer: Add read replicas and implement caching to reduce load on the primary database before considering sharding or hardware upgrades.

Key Result
Storage access patterns impact I/O throughput and latency; caching and sharding are key to scaling as user and data volume grow.