0
0
LLDsystem_design~10 mins

Move validation and check detection in LLD - Scalability & System Analysis

Choose your learning style9 modes available
Scalability Analysis - Move validation and check detection
Growth Table: Move Validation and Check Detection
Users / GamesMove Requests per SecondValidation LatencyCPU UsageMemory UsageStorage for Game States
100 users (~50 games)~5 moves/sec<10 msLowLowSmall (few MB)
10,000 users (~5,000 games)~500 moves/sec10-50 msModerateModerateMedium (GBs)
1,000,000 users (~500,000 games)~50,000 moves/sec50-200 msHighHighLarge (TBs)
100,000,000 users (~50,000,000 games)~5,000,000 moves/sec200+ ms (unacceptable)Very HighVery HighVery Large (PBs)
First Bottleneck

The first bottleneck is the CPU and memory on the application servers that perform move validation and check detection. This logic is computationally intensive because it requires analyzing the current game state, applying chess rules, and detecting check conditions.

At small scale, a single server can handle all validations quickly. As users grow, the CPU load increases linearly with move requests. Memory usage also grows due to storing many active game states.

Eventually, the server CPU cores become saturated, causing increased latency and slower validation responses.

Scaling Solutions
  • Horizontal scaling: Add more application servers to distribute move validation load. Use a load balancer to route requests.
  • State partitioning: Partition games by user or game ID so each server handles a subset of games, reducing memory and CPU per server.
  • Caching: Cache recent validation results or partial computations to avoid repeated heavy calculations.
  • Asynchronous processing: For non-blocking UI, validate moves asynchronously and notify clients when done.
  • Offload check detection: Use specialized microservices or optimized libraries (e.g., native code) for check detection to improve performance.
  • Database optimization: Store game states efficiently and use in-memory stores (like Redis) for fast access.
Back-of-Envelope Cost Analysis
  • At 1M users with ~50,000 moves/sec, assuming each validation takes 10 ms CPU time, total CPU time needed per second is 500 seconds. With 8-core servers, each core can handle ~100 validations/sec, so ~63 servers needed.
  • Memory per game state ~10 KB, for 500,000 games = ~5 GB RAM needed just for game states.
  • Network bandwidth per move is small (~1 KB), so 50,000 moves/sec = ~50 MB/s, manageable on 1 Gbps links.
  • Storage for historical game data grows with users; archiving old games reduces storage pressure.
Interview Tip

Start by explaining the move validation process and why it is CPU intensive. Then discuss how load grows with users and moves per second. Identify the CPU and memory on app servers as the first bottleneck. Propose horizontal scaling and partitioning as primary solutions. Mention caching and asynchronous processing as optimizations. Finally, discuss database and network considerations briefly.

Self Check

Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?

Answer: Since move validation is CPU intensive, first add more application servers to horizontally scale validation. Also partition game states to distribute load. Then consider caching and database read replicas if needed.

Key Result
Move validation and check detection scale linearly with move requests, making CPU and memory on application servers the first bottleneck. Horizontal scaling and partitioning are key to handle growth beyond thousands of concurrent games.