0
0
HLDsystem_design~7 mins

Blob storage (S3, Azure Blob) in HLD - System Design Guide

Choose your learning style9 modes available
Problem Statement
Storing and retrieving large amounts of unstructured data like images, videos, or backups on traditional file systems or databases leads to poor scalability, slow access, and high maintenance overhead. Without a specialized storage system, handling massive files reliably and efficiently becomes a bottleneck and risks data loss.
Solution
Blob storage systems store data as objects (blobs) in a flat namespace, allowing easy scaling by distributing data across many servers. They provide APIs to upload, download, and manage blobs with high durability and availability. Data is replicated and stored with metadata, enabling efficient access and management of large unstructured files.
Architecture
Client App
(Upload/Read)
Blob Storage
Metadata Store

This diagram shows a client interacting with a blob storage service API, which manages metadata and stores data across multiple replicated storage nodes for durability and availability.

Trade-offs
✓ Pros
Highly scalable storage for massive unstructured data without complex file hierarchies.
Built-in data replication ensures durability and availability even if some nodes fail.
Simple API for uploading, downloading, and managing blobs abstracts infrastructure complexity.
Supports metadata tagging for efficient data organization and retrieval.
✗ Cons
Eventual consistency models in some blob stores can cause temporary stale reads.
Higher latency compared to local file systems due to network and replication overhead.
Costs can increase with large data volumes and frequent access patterns.
Use blob storage when your system needs to store large amounts of unstructured data (images, videos, backups) with high durability and global accessibility, especially when data size exceeds hundreds of gigabytes or more.
Avoid blob storage for small, highly transactional data requiring low latency and strong consistency, such as relational data or real-time analytics under 1,000 requests per second.
Real World Examples
Netflix
Stores and streams large video files globally using Amazon S3 to ensure durability and fast access for millions of users.
Uber
Uses Azure Blob Storage to store trip logs, images, and other unstructured data with scalable access and backup.
Shopify
Stores product images and backups in blob storage to handle large volumes of data with high availability.
Alternatives
File Storage (Network Attached Storage)
Stores data in hierarchical file systems accessible over a network, suitable for structured file access.
Use when: Choose when applications require traditional file system semantics and low latency within a local network.
Block Storage
Provides raw storage volumes for operating systems or databases, exposing block-level devices rather than objects.
Use when: Choose when you need low-level storage for databases or virtual machines with high IOPS and low latency.
Summary
Blob storage stores large unstructured data as objects with metadata in a scalable, durable way.
It uses replication and distributed storage nodes to ensure high availability and fault tolerance.
Blob storage is ideal for media files, backups, and big data, but not for low-latency transactional data.