Bird
Raised Fist0
HLDsystem_design~25 mins

Media storage and CDN in HLD - System Design Exercise

Choose your learning style9 modes available
Design: Media Storage and CDN System
Includes media upload, storage, CDN distribution, access control, and analytics. Excludes media editing or transcoding features.
Functional Requirements
FR1: Store large volumes of media files (images, videos, audio) uploaded by users
FR2: Serve media content to users with low latency globally
FR3: Support high read traffic with caching and content delivery
FR4: Allow secure access to media files with authentication and authorization
FR5: Support media file versioning and updates
FR6: Provide analytics on media access patterns
Non-Functional Requirements
NFR1: Handle 1 million media uploads per day
NFR2: Support 10 million daily media content requests globally
NFR3: API response latency for media requests should be under 200ms (p99)
NFR4: System availability target of 99.9% uptime
NFR5: Data durability with replication and backups
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
❓ Question 6
Key Components
Object storage service for media files
Content Delivery Network (CDN) for caching and global delivery
API gateway for upload and access requests
Authentication and authorization service
Metadata database for media info and versions
Logging and analytics service
Design Patterns
Cache-aside pattern for CDN caching
Event-driven invalidation for cache updates
Multi-region replication for durability and availability
Token-based secure access to media URLs
Reference Architecture
Client
  |
  | HTTP Upload/Download Requests
  v
API Gateway --- Authentication Service
  |
  | Upload -> Store metadata
  | Download -> Validate access
  v
Metadata DB <-> Object Storage (S3-like)
  |
  | Media files
  v
CDN Edge Locations
  |
  | Cached media delivery
  v
Clients Worldwide

Analytics Service collects logs from API Gateway and CDN
Components
API Gateway
Nginx or AWS API Gateway
Handles client requests for media upload and download, routes to backend services
Authentication Service
OAuth 2.0 / JWT
Validates user identity and permissions for media access
Metadata Database
PostgreSQL or DynamoDB
Stores media metadata, user info, versions, and access control data
Object Storage
Amazon S3 or MinIO
Stores actual media files with high durability and scalability
Content Delivery Network (CDN)
Cloudflare, AWS CloudFront, or Akamai
Caches media files at edge locations for low latency global delivery
Analytics Service
Elastic Stack or Google BigQuery
Collects and analyzes access logs for usage patterns and metrics
Request Flow
1. User uploads media file via API Gateway
2. API Gateway authenticates user via Authentication Service
3. Metadata about media is stored in Metadata Database
4. Media file is stored in Object Storage
5. User requests media download via API Gateway
6. API Gateway validates access with Authentication Service
7. Request is routed to CDN edge location
8. If media is cached in CDN, it is served directly to user
9. If not cached, CDN fetches media from Object Storage
10. Access logs are sent to Analytics Service for processing
Database Schema
Entities: - User (user_id, name, email, roles) - MediaFile (media_id, user_id, filename, size, content_type, upload_timestamp, version, access_permissions) - MediaVersion (version_id, media_id, version_number, created_at, metadata) - AccessLog (log_id, media_id, user_id, access_time, action_type) Relationships: - User 1:N MediaFile - MediaFile 1:N MediaVersion - MediaFile 1:N AccessLog
Scaling Discussion
Bottlenecks
Object Storage throughput limits under heavy upload/download
API Gateway becoming a request bottleneck
Metadata Database write/read load with millions of media files
CDN cache misses causing latency spikes
Authentication Service latency under high concurrent requests
Solutions
Use scalable object storage with multi-region replication and parallel uploads
Deploy multiple API Gateway instances behind load balancers
Use a horizontally scalable NoSQL database or sharded relational DB for metadata
Implement cache pre-warming and efficient cache invalidation strategies
Use stateless authentication tokens (JWT) to reduce authentication service load
Interview Tips
Time: 10 minutes for requirements and clarifications, 15 minutes for architecture and components, 10 minutes for scaling and trade-offs, 10 minutes for Q&A
Clarify media types, sizes, and access patterns early
Explain choice of object storage and CDN for scalability and latency
Discuss security with authentication and authorization
Describe caching strategy and cache invalidation
Highlight how system handles scale and availability
Mention analytics for monitoring usage and performance