| Scale | Users | Feed Requests/Second | Data Volume | Latency Expectation | System Changes |
|---|---|---|---|---|---|
| Small | 100 | 10-50 | MBs of posts | < 1s | Single server, simple DB queries |
| Medium | 10,000 | 1,000-5,000 | GBs of posts | < 500ms | DB indexing, caching, load balancer |
| Large | 1,000,000 | 50,000-100,000 | TBs of posts | < 300ms | Feed pre-generation, sharded DB, distributed cache |
| Very Large | 100,000,000 | 5,000,000+ | Petabytes of posts | < 200ms | Massive horizontal scaling, CDN, microservices, data partitioning |
News feed generation in HLD - Scalability & System Analysis
At small scale, the database query speed limits feed generation because fetching and sorting posts for each user is slow.
At medium scale, the database CPU and I/O become bottlenecks due to many concurrent feed requests.
At large scale, network bandwidth and cache invalidation delays cause latency issues.
- Database Optimization: Add indexes, use read replicas to distribute read load.
- Caching: Use in-memory caches (e.g., Redis) to store popular feeds or feed fragments.
- Feed Pre-generation: Generate feeds offline and store them for quick retrieval.
- Sharding: Partition user data across multiple databases to reduce load per instance.
- Horizontal Scaling: Add more application servers behind load balancers.
- Content Delivery Network (CDN): Cache static content and reduce latency globally.
- Microservices: Separate feed generation, user service, and post service for better scalability.
- At 1M users with 100K feed requests/sec, assuming each feed request reads 50 posts (~10KB each), total data read = 100K * 50 * 10KB = ~50GB/s.
- Storage needed for posts: If each user generates 10 posts/day, 1M users produce 10M posts/day (~100GB/day assuming 10KB/post).
- Network bandwidth: 50GB/s read traffic requires multiple 10Gbps network links.
- CPU: Multiple servers needed to handle sorting and merging posts per feed request.
Start by explaining the user scale and traffic. Identify the main bottleneck (usually DB). Discuss caching and pre-generation to reduce load. Mention sharding and horizontal scaling for large scale. Always justify why each solution fits the bottleneck.
Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?
Answer: Add read replicas and implement caching to reduce direct DB load before scaling application servers.
