| Users | Game State Size | Server Load | Latency | Storage Needs |
|---|---|---|---|---|
| 100 users | Small (few MBs) | Single server handles all | Low (real-time) | Minimal, local storage |
| 10,000 users | Medium (GBs) | Multiple servers, load balancer | Low to medium | Distributed cache + DB |
| 1,000,000 users | Large (TBs) | Cluster of servers, sharded DB | Medium (optimized) | Sharded DB + caching layers |
| 100,000,000 users | Very large (PBs) | Massive clusters, global distribution | Medium to high (edge caching) | Multi-region DB, archival storage |
Game state management in LLD - Scalability & System Analysis
The first bottleneck is the database that stores and retrieves game states. As user count grows, the number of read/write operations increases rapidly. A single database instance can handle only a limited number of queries per second (QPS), typically up to 5,000-10,000 QPS for a relational DB. Beyond this, latency increases and requests queue up, causing delays in game state updates and retrievals.
- Read Replicas: Use read replicas to distribute read queries and reduce load on the primary database.
- Caching: Implement in-memory caches (e.g., Redis) for frequently accessed game states to reduce DB hits.
- Sharding: Partition the database by user ID or game session to spread load across multiple DB instances.
- Horizontal Scaling: Add more application servers behind a load balancer to handle more concurrent connections.
- Eventual Consistency: Use asynchronous updates where strict real-time consistency is not critical to reduce DB write pressure.
- Edge Caching: For global users, cache game state snapshots closer to users to reduce latency.
Assuming 1 million concurrent users, each sending 1 state update per second:
- Requests per second: ~1,000,000 QPS (too high for single DB)
- Storage: If each game state is 10 KB, total active data ~10 GB in memory/cache; historical data grows daily.
- Bandwidth: 1,000,000 updates * 10 KB = ~10 GB/s (requires high network capacity)
This shows the need for sharding, caching, and horizontal scaling to handle load and bandwidth.
Start by explaining the components involved: game clients, servers, database, and cache. Discuss how game state updates flow and where bottlenecks appear as users grow. Then, propose scaling strategies step-by-step, focusing on database scaling first, followed by application and network layers. Use real numbers to justify your choices and show understanding of trade-offs.
Your database handles 1,000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?
Answer: Add read replicas and implement caching to reduce direct database load before considering sharding or adding more servers.