HLDsystem_design~25 mins

Blob storage (S3, Azure Blob) in HLD - System Design Exercise

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Design: Blob Storage Service

Design covers the core blob storage system including API, storage backend, metadata management, security, and lifecycle management. Does not cover client SDKs or CDN integration.

Functional Requirements

FR1: Store and retrieve large binary files (images, videos, documents) efficiently

FR2: Support upload, download, and delete operations for blobs

FR3: Provide secure access with authentication and authorization

FR4: Allow scalable storage to handle petabytes of data

FR5: Support metadata tagging for blobs

FR6: Enable versioning of blobs to keep track of changes

FR7: Provide high availability and durability of stored data

FR8: Allow lifecycle management to archive or delete old blobs automatically

Non-Functional Requirements

NFR1: Handle up to 1 million concurrent users

NFR2: Support low latency for read operations (p99 < 200ms)

NFR3: Ensure 99.9% availability

NFR4: Data durability of 99.999999999% (11 nines)

NFR5: Support eventual consistency for metadata updates

NFR6: Secure data in transit and at rest

Think Before You Design

Questions to Ask

❓ Question 1

❓ Question 2

❓ Question 3

❓ Question 4

❓ Question 5

❓ Question 6

Key Components

API Gateway or Load Balancer

Authentication and Authorization Service

Blob Storage Nodes (object storage servers)

Metadata Database

Cache Layer for frequently accessed blobs

Lifecycle Management Service

Replication and Backup Systems

Design Patterns

Sharding and partitioning for storage scalability

Eventual consistency for metadata updates

Multi-region replication for durability and availability

Write-ahead logging for data durability

Token-based authentication (e.g., OAuth, signed URLs)

Reference Architecture

Client
  |
  v
API Gateway / Load Balancer
  |
  v
Auth Service <--> Metadata DB
  |
  v
Blob Storage Nodes (distributed object storage cluster)
  |
  v
Replication & Backup Systems
  |
  v
Lifecycle Management Service

Cache Layer (Redis or CDN) connected to Blob Storage Nodes for fast reads

Components

API Gateway / Load Balancer

Nginx, AWS ALB, Azure Front Door

Route client requests, handle SSL termination, and distribute load

Authentication and Authorization Service

OAuth 2.0, JWT tokens

Verify user identity and permissions for blob operations

Blob Storage Nodes

Distributed object storage (e.g., Ceph, custom storage nodes)

Store blob data reliably and serve read/write requests

Metadata Database

Relational DB (PostgreSQL) or NoSQL (DynamoDB)

Store blob metadata, tags, version info, and access control lists

Cache Layer

Redis or CDN

Cache frequently accessed blobs to reduce latency

Lifecycle Management Service

Scheduled jobs or serverless functions

Automatically archive or delete blobs based on policies

Replication and Backup Systems

Cross-region replication, snapshot backups

Ensure data durability and availability in case of failures

Request Flow

1. Client sends request to API Gateway to upload/download/delete blob

2. API Gateway forwards request to Authentication Service to verify user

3. If authorized, request proceeds to Blob Storage Nodes for data operations

4. Metadata updates are written to Metadata Database asynchronously

5. Blob data is stored in distributed storage with replication

6. Cache Layer serves repeated read requests to reduce latency

7. Lifecycle Management Service runs periodically to enforce retention policies

Database Schema

Entities: - Blob: id (PK), name, size, content_type, created_at, updated_at, version, storage_location - Metadata: blob_id (FK), key, value - User: id (PK), username, credentials - AccessControlList: blob_id (FK), user_id (FK), permission - LifecyclePolicy: id (PK), blob_filter, action, schedule Relationships: - One Blob has many Metadata entries - One Blob has many AccessControlList entries - Users have permissions via AccessControlList

Scaling Discussion

Bottlenecks

API Gateway overload with too many concurrent requests

Metadata Database becoming a single point of contention

Blob Storage Nodes running out of capacity or bandwidth

Cache misses causing high latency on popular blobs

Replication lag affecting data durability guarantees

Solutions

Use horizontal scaling and auto-scaling groups for API Gateway

Partition metadata database by blob ID or user to distribute load

Add more storage nodes and use sharding to balance data

Implement multi-level caching and CDN integration

Use asynchronous replication with conflict resolution and monitor lag

Interview Tips

Time: 10 minutes for requirements and clarifications, 15 minutes for architecture and components, 10 minutes for scaling and trade-offs, 10 minutes for Q&A

Clarify functional and non-functional requirements upfront

Explain choice of distributed object storage for scalability

Discuss metadata management separate from blob data

Highlight security and access control mechanisms

Address consistency and replication trade-offs

Show awareness of caching and lifecycle management

Discuss bottlenecks and scaling strategies clearly