0
0
HLDsystem_design~25 mins

Blob storage (S3, Azure Blob) in HLD - System Design Exercise

Choose your learning style9 modes available
Design: Blob Storage Service
Design covers the core blob storage system including API, storage backend, metadata management, security, and lifecycle management. Does not cover client SDKs or CDN integration.
Functional Requirements
FR1: Store and retrieve large binary files (images, videos, documents) efficiently
FR2: Support upload, download, and delete operations for blobs
FR3: Provide secure access with authentication and authorization
FR4: Allow scalable storage to handle petabytes of data
FR5: Support metadata tagging for blobs
FR6: Enable versioning of blobs to keep track of changes
FR7: Provide high availability and durability of stored data
FR8: Allow lifecycle management to archive or delete old blobs automatically
Non-Functional Requirements
NFR1: Handle up to 1 million concurrent users
NFR2: Support low latency for read operations (p99 < 200ms)
NFR3: Ensure 99.9% availability
NFR4: Data durability of 99.999999999% (11 nines)
NFR5: Support eventual consistency for metadata updates
NFR6: Secure data in transit and at rest
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
❓ Question 6
Key Components
API Gateway or Load Balancer
Authentication and Authorization Service
Blob Storage Nodes (object storage servers)
Metadata Database
Cache Layer for frequently accessed blobs
Lifecycle Management Service
Replication and Backup Systems
Design Patterns
Sharding and partitioning for storage scalability
Eventual consistency for metadata updates
Multi-region replication for durability and availability
Write-ahead logging for data durability
Token-based authentication (e.g., OAuth, signed URLs)
Reference Architecture
Client
  |
  v
API Gateway / Load Balancer
  |
  v
Auth Service <--> Metadata DB
  |
  v
Blob Storage Nodes (distributed object storage cluster)
  |
  v
Replication & Backup Systems
  |
  v
Lifecycle Management Service

Cache Layer (Redis or CDN) connected to Blob Storage Nodes for fast reads
Components
API Gateway / Load Balancer
Nginx, AWS ALB, Azure Front Door
Route client requests, handle SSL termination, and distribute load
Authentication and Authorization Service
OAuth 2.0, JWT tokens
Verify user identity and permissions for blob operations
Blob Storage Nodes
Distributed object storage (e.g., Ceph, custom storage nodes)
Store blob data reliably and serve read/write requests
Metadata Database
Relational DB (PostgreSQL) or NoSQL (DynamoDB)
Store blob metadata, tags, version info, and access control lists
Cache Layer
Redis or CDN
Cache frequently accessed blobs to reduce latency
Lifecycle Management Service
Scheduled jobs or serverless functions
Automatically archive or delete blobs based on policies
Replication and Backup Systems
Cross-region replication, snapshot backups
Ensure data durability and availability in case of failures
Request Flow
1. Client sends request to API Gateway to upload/download/delete blob
2. API Gateway forwards request to Authentication Service to verify user
3. If authorized, request proceeds to Blob Storage Nodes for data operations
4. Metadata updates are written to Metadata Database asynchronously
5. Blob data is stored in distributed storage with replication
6. Cache Layer serves repeated read requests to reduce latency
7. Lifecycle Management Service runs periodically to enforce retention policies
Database Schema
Entities: - Blob: id (PK), name, size, content_type, created_at, updated_at, version, storage_location - Metadata: blob_id (FK), key, value - User: id (PK), username, credentials - AccessControlList: blob_id (FK), user_id (FK), permission - LifecyclePolicy: id (PK), blob_filter, action, schedule Relationships: - One Blob has many Metadata entries - One Blob has many AccessControlList entries - Users have permissions via AccessControlList
Scaling Discussion
Bottlenecks
API Gateway overload with too many concurrent requests
Metadata Database becoming a single point of contention
Blob Storage Nodes running out of capacity or bandwidth
Cache misses causing high latency on popular blobs
Replication lag affecting data durability guarantees
Solutions
Use horizontal scaling and auto-scaling groups for API Gateway
Partition metadata database by blob ID or user to distribute load
Add more storage nodes and use sharding to balance data
Implement multi-level caching and CDN integration
Use asynchronous replication with conflict resolution and monitor lag
Interview Tips
Time: 10 minutes for requirements and clarifications, 15 minutes for architecture and components, 10 minutes for scaling and trade-offs, 10 minutes for Q&A
Clarify functional and non-functional requirements upfront
Explain choice of distributed object storage for scalability
Discuss metadata management separate from blob data
Highlight security and access control mechanisms
Address consistency and replication trade-offs
Show awareness of caching and lifecycle management
Discuss bottlenecks and scaling strategies clearly