| Users | Data Volume | Network Traffic | Storage Needs | Infrastructure Changes |
|---|---|---|---|---|
| 100 users | Low (few GB/day) | Low (few Mbps) | Small (hundreds GB) | Single server, simple CDN |
| 10,000 users | Medium (TB/day) | High (Gbps) | Large (tens TB) | Multiple servers, CDN expansion, caching |
| 1,000,000 users | Very High (PB/month) | Very High (hundreds Gbps) | Very Large (PB scale) | Distributed storage, multi-region CDN, load balancing |
| 100,000,000 users | Extreme (Exabytes/year) | Extreme (Tbps) | Massive (multi-Exabyte) | Global CDN, sharded storage, edge computing |
Why video streaming handles massive data in HLD - Scalability Evidence
As user count grows, the biggest challenge is moving large video files fast enough to many users simultaneously. Network bandwidth limits how much data can be sent at once. Storage input/output speed limits how quickly video files can be read and served. These break first before CPU or memory.
- Content Delivery Network (CDN): Distribute video copies closer to users worldwide to reduce bandwidth load on origin servers and lower latency.
- Video Compression and Adaptive Streaming: Use efficient codecs and adjust video quality based on user bandwidth to reduce data size.
- Horizontal Scaling: Add more streaming servers behind load balancers to handle more concurrent connections.
- Distributed Storage: Use sharded and replicated storage systems to handle massive video data and high read throughput.
- Edge Computing: Process and cache video data at network edges to reduce central server load and improve speed.
Assuming 1 million users streaming 2 Mbps video simultaneously:
- Network bandwidth needed: 2 Mbps * 1,000,000 = 2 Tbps (terabits per second)
- Storage: 1 hour of HD video ~3 GB, 1 million users streaming 1 hour = 3 PB (petabytes) data served
- Requests per second: If each user requests video chunks every 10 seconds, 100,000 QPS to origin servers
- Infrastructure: Requires multi-region CDN, distributed storage clusters, and high bandwidth backbone
Start by identifying key resources (network, storage, CPU). Discuss growth impact on each. Identify first bottleneck (usually bandwidth/storage I/O). Propose targeted solutions like CDN, compression, horizontal scaling. Quantify with rough numbers. Show understanding of trade-offs and cost.
Your video streaming database handles 1000 QPS. Traffic grows 10x. What do you do first and why?
Answer: Add read replicas and implement caching to reduce load on the main database, because the database is the first bottleneck at increased traffic.
