Bird
Raised Fist0
HLDsystem_design~10 mins

Media storage and CDN in HLD - Scalability & System Analysis

Choose your learning style9 modes available
Scalability Analysis - Media storage and CDN
Growth Table: Media Storage and CDN Scaling
UsersMedia StorageCDN UsageNetwork TrafficLatency
100 usersSingle storage server, local diskMinimal or no CDNLow, direct fetch from storageLow latency, direct access
10,000 usersDistributed storage cluster, object storageBasic CDN with few edge nodesModerate, some cachingImproved latency via CDN edges
1,000,000 usersHighly scalable object storage (S3-like), multi-regionGlobal CDN with many edge locationsHigh, CDN offloads originLow latency globally
100,000,000 usersMulti-cloud, geo-redundant storage, sharded dataAdvanced CDN with dynamic content optimizationVery high, optimized deliveryConsistent low latency worldwide
First Bottleneck

At small scale, the media storage server disk I/O and network bandwidth limit throughput.

At medium scale (~10K users), the origin storage bandwidth and read latency become bottlenecks.

At large scale (1M+ users), the CDN edge cache capacity and cache miss rate impact performance.

Without CDN, origin servers get overwhelmed by traffic spikes.

Scaling Solutions
  • Horizontal scaling: Add more storage nodes and CDN edge servers to distribute load.
  • Caching: Use CDN to cache media close to users, reducing origin load.
  • Sharding: Partition media storage by region or content type to improve access speed.
  • Multi-region replication: Store copies of media in multiple geographic locations.
  • Compression and optimization: Reduce media size for faster delivery.
  • Load balancing: Distribute requests evenly across storage and CDN nodes.
Back-of-Envelope Cost Analysis

Assuming 1M users, each streaming 1 video per day of 5 MB:

  • Requests per second (QPS): ~12 (1M users * 1 request / 86400 seconds)
  • Daily data transfer: 5 TB (1M * 5 MB)
  • Bandwidth needed at origin: Reduced by CDN cache hit ratio (e.g., 90% cache hit reduces origin bandwidth to 0.5 TB)
  • Storage needed: Depends on retention, e.g., 30 days = 150 TB
  • Network bandwidth: CDN edges handle most traffic, origin bandwidth is bottleneck without caching
Interview Tip

Start by defining user scale and media size.

Identify origin storage limits and CDN role early.

Discuss caching strategies and geographic distribution.

Explain how to handle cache misses and data consistency.

Always mention cost and latency trade-offs.

Self Check Question

Your media storage origin handles 1000 QPS. Traffic grows 10x. What do you do first?

Answer: Deploy or increase CDN edge caching to offload origin servers and reduce direct requests, preventing origin overload.

Key Result
Media storage scales by adding distributed storage and using CDN caching to offload origin servers; the first bottleneck is origin bandwidth and disk I/O, solved by CDN and horizontal scaling.