0
0
HLDsystem_design~10 mins

Blob storage (S3, Azure Blob) in HLD - Scalability & System Analysis

Choose your learning style9 modes available
Scalability Analysis - Blob storage (S3, Azure Blob)
Growth Table: Blob Storage Scaling
Users / Scale100 Users10,000 Users1 Million Users100 Million Users
Data StoredFew GBsFew TBsPetabytesExabytes
Requests per Second (RPS)~100-500 RPS~5,000 RPS~100,000 RPSMillions RPS
LatencyLow (ms)Low to ModerateModerate (due to scale)Needs optimization (CDN, edge)
Storage ManagementSimple bucketsMultiple buckets, lifecycle policiesMulti-region replication, tiered storageGlobal distribution, archival tiers
CostLowModerateHighVery High
First Bottleneck

At small to medium scale, the first bottleneck is the request rate limit on the blob storage service. Each storage account or bucket has limits on requests per second. When requests exceed these limits, throttling occurs causing increased latency and errors.

At larger scale, network bandwidth and storage management (handling huge data volumes and replication) become bottlenecks.

Scaling Solutions
  • Horizontal scaling: Use multiple buckets or storage accounts to distribute load.
  • Caching: Use CDN (Content Delivery Network) to cache frequently accessed blobs closer to users, reducing direct storage requests.
  • Sharding: Partition data by user or region to spread load and storage.
  • Lifecycle policies: Move older data to cheaper, slower storage tiers (archive) to save cost and improve performance.
  • Multi-region replication: Replicate data across regions for availability and reduced latency.
  • Request batching: Combine multiple small requests into fewer larger requests to reduce overhead.
Back-of-Envelope Cost Analysis

Assuming 1 million users, each uploading 1 MB per day:

  • Data stored per day: 1 million MB = ~1 TB/day
  • Monthly storage: ~30 TB
  • Requests: If each user makes 10 requests/day, total 10 million requests/day (~115 requests/sec)
  • Bandwidth: 1 TB/day upload + downloads (depends on usage)
  • Costs scale with storage size, request count, and bandwidth used.
Interview Tip

When discussing blob storage scalability, start by estimating data size and request rates. Identify the first bottleneck (usually request limits or bandwidth). Then propose solutions like horizontal scaling, CDN caching, and lifecycle management. Always mention cost and availability trade-offs.

Self Check Question

Your blob storage service handles 1000 requests per second. Traffic grows 10x to 10,000 RPS. What do you do first?

Answer: Distribute load by creating multiple storage buckets/accounts and use a CDN to cache content, reducing direct requests. This avoids throttling and scales request handling.

Key Result
Blob storage scales well with data size but first breaks at request rate limits; horizontal scaling and CDN caching are key to handle high traffic.