HLDsystem_design~10 mins

Storage access patterns in HLD - Scalability & System Analysis

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Scalability Analysis - Storage access patterns

Growth Table: Storage Access Patterns

Users / Data Size	100 Users	10K Users	1M Users	100M Users
Read Requests per Second	~100-500	~10K	~1M	~100M
Write Requests per Second	~50-200	~5K	~500K	~50M
Data Size	GBs	TBs	PBs	Exabytes
Access Pattern	Mostly sequential, low concurrency	Mixed sequential and random, moderate concurrency	Highly random, high concurrency	Highly random, extreme concurrency, multi-region
Latency Sensitivity	Low to moderate	Moderate	High	Very high
Storage Type	Local disks or single DB	Distributed storage, caching introduced	Sharded DBs, SSDs, caching layers	Multi-region distributed storage, tiered storage

First Bottleneck

At small scale (100 users), storage access is simple and local disks or single database handle requests well.

At medium scale (10K-1M users), the first bottleneck is the storage I/O throughput and latency. Random access patterns cause disk seek delays and slow queries.

At large scale (100M users), the bottleneck shifts to storage system scalability and network bandwidth between distributed storage nodes.

Scaling Solutions

Caching: Use in-memory caches (Redis, Memcached) to reduce read load on storage.
Sharding: Split data horizontally across multiple storage nodes to distribute load.
Data Partitioning: Organize data by access patterns to optimize sequential reads and reduce random I/O.
Tiered Storage: Use fast SSDs for hot data and slower disks for cold data.
Replication: Use read replicas to scale read throughput and improve availability.
CDN: For static content, use Content Delivery Networks to reduce storage access and latency.
Compression and Compaction: Reduce storage size and improve I/O efficiency.
Asynchronous Writes: Buffer writes to reduce write latency and batch operations.

Back-of-Envelope Cost Analysis

Assuming 1M users with 1M read and 500K write requests per second:

Storage IOPS needed: ~1.5 million per second (reads + writes)
Network bandwidth: For 1KB average request size, ~1.5 GB/s (~12 Gbps)
Storage capacity: Petabyte scale, depending on data retention
Cache size: Tens to hundreds of GBs to hold hot data
Number of storage nodes: Hundreds to thousands, depending on node capacity

Interview Tip

When discussing storage access patterns scalability, start by describing the current access pattern (sequential vs random, read/write ratio).

Explain how these patterns affect storage I/O and latency.

Identify the bottleneck clearly (I/O throughput, latency, network).

Then propose targeted solutions like caching, sharding, replication, and tiered storage.

Always connect solutions to the specific bottleneck caused by access patterns.

Self Check

Your database handles 1000 QPS. Traffic grows 10x. What do you do first?

Answer: Add read replicas and implement caching to reduce load on the primary database before considering sharding or hardware upgrades.

Key Result

Storage access patterns impact I/O throughput and latency; caching and sharding are key to scaling as user and data volume grow.