| Scale | Users | Data Volume | Traffic | System Changes |
|---|---|---|---|---|
| Small | 100 users | Few thousand videos, user profiles | Low QPS (~100 requests/sec) | Single app server, single DB instance, simple batch recommendations |
| Medium | 10K users | Millions of videos, user interactions | Moderate QPS (~10K requests/sec) | Multiple app servers, DB read replicas, caching layer, offline model training |
| Large | 1M users | Hundreds of millions of videos, rich user data | High QPS (~100K requests/sec) | Distributed databases, sharded user data, real-time streaming data pipelines, CDN for video delivery |
| Very Large | 100M users | Billions of videos and interactions | Very high QPS (~10M requests/sec) | Multi-region deployment, advanced sharding, multi-level caching, AI model serving clusters, global CDN, data archiving |
Video recommendation system in HLD - Scalability & System Analysis
At small scale, the database is the first bottleneck because it must handle many read and write requests for user interactions and video metadata. As users grow, the recommendation model training and serving become bottlenecks due to heavy computation and data volume. At large scale, network bandwidth and data storage also become critical bottlenecks.
- Database: Use read replicas to handle read traffic, connection pooling, and eventually shard user and video data by user ID or video category.
- Caching: Cache popular recommendations and video metadata using Redis or Memcached to reduce DB load.
- Application Servers: Horizontally scale app servers behind load balancers to handle increased request volume.
- Model Training and Serving: Use distributed computing frameworks for offline training and deploy models on dedicated serving clusters with GPU acceleration.
- Data Pipelines: Implement real-time streaming pipelines (e.g., Kafka) for user activity ingestion and feature updates.
- Content Delivery: Use a global CDN to serve video content efficiently and reduce latency.
- Storage: Use distributed object storage for videos and archive old data to cheaper storage tiers.
- Requests per second: At 1M users, expect ~100K QPS for recommendations and video views.
- Storage: Videos require petabytes of storage; metadata and user data require terabytes to petabytes.
- Bandwidth: Video streaming consumes the most bandwidth; a 1 Gbps link can serve ~125 MB/s, so multiple CDN edge servers are needed.
- Compute: Model training requires GPU clusters; serving models requires CPU/GPU servers scaled horizontally.
Start by clarifying scale and requirements. Discuss data volume, traffic patterns, and latency needs. Identify the first bottleneck and propose targeted solutions. Explain trade-offs between consistency, latency, and cost. Use real numbers to justify scaling steps. Show understanding of caching, sharding, and distributed systems.
Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?
Answer: Add read replicas and implement caching to reduce database load before considering sharding or more complex solutions.
