| Users / Games | 100 | 10,000 | 1,000,000 | 100,000,000 |
|---|---|---|---|---|
| Concurrent Games | 50 | 5,000 | 500,000 | 50,000,000 |
| Win Checks per Second | 500 | 50,000 | 5,000,000 | 500,000,000 |
| Latency per Check | ~1 ms | ~1 ms | ~5 ms | ~10 ms |
| Memory Usage | Low (MBs) | Moderate (GBs) | High (TBs) | Very High (PBs) |
| Storage for Game States | Minimal | Moderate | Large | Very Large |
Win condition checking in LLD - Scalability & System Analysis
The first bottleneck is the CPU and memory on the application servers that perform the win condition checks. Each check requires computation to evaluate the game state. As concurrent games increase, the CPU load grows linearly. At around 10,000 concurrent games, a single server cannot keep up with the required checks per second.
- Horizontal Scaling: Add more application servers to distribute win condition checks. Use load balancers to route requests evenly.
- Sharding: Partition games by user or game ID so each server handles a subset of games, reducing per-server load.
- Caching: Cache recent game states and partial results to avoid redundant computations.
- Event-driven Checks: Instead of checking after every move, trigger checks only on relevant state changes to reduce frequency.
- Optimize Algorithms: Use efficient data structures and incremental checks to minimize CPU usage.
- Use In-memory Databases: Store game states in fast in-memory stores like Redis to speed up access.
At 10,000 concurrent games, assuming each game requires 5 win checks per second:
- Total checks per second = 50,000
- Each check takes ~1 ms CPU time -> total CPU time = 50 seconds per second (impossible on one core)
- Need at least 50 CPU cores or 50 servers to handle load
- Memory: Each game state ~10 KB -> 10,000 games = ~100 MB RAM
- Network bandwidth: Minimal for win checks, mostly internal server communication
Start by explaining the core operation: checking win conditions per game move. Then estimate how many checks happen per second at different user scales. Identify the CPU as the first bottleneck due to computation. Discuss horizontal scaling and algorithm optimization as primary solutions. Mention caching and sharding to improve efficiency. Always justify your choices with simple numbers.
Your database handles 1000 QPS for storing game states. Traffic grows 10x to 10,000 QPS. What do you do first?
Answer: The first step is to add read replicas and implement caching to reduce database load. Also, optimize queries and consider sharding the database by game or user ID to distribute load.
