| Users | Orders per Second | State Transitions per Second | Storage Size (Order States) | Latency Requirements |
|---|---|---|---|---|
| 100 | 10 | 20 | ~10 MB | Low (seconds) |
| 10,000 | 1,000 | 2,000 | ~1 GB | Medium (sub-second) |
| 1,000,000 | 100,000 | 200,000 | ~100 GB | High (milliseconds) |
| 100,000,000 | 10,000,000 | 20,000,000 | ~10 TB | Very High (milliseconds) |
Order tracking state machine in LLD - Scalability & System Analysis
The first bottleneck is the database handling state transitions. As order states update frequently, the database must handle many writes and reads per second. At around 10,000 users, the database write throughput and latency become critical because each order state change requires a write and often a read to confirm the current state.
- Database Scaling: Use write-optimized databases or NoSQL stores for fast state updates. Add read replicas to handle read-heavy queries.
- Caching: Cache current order states in memory (e.g., Redis) to reduce database reads.
- Horizontal Scaling: Add more application servers behind load balancers to handle increased state transition requests.
- Sharding: Partition orders by user ID or region to distribute database load.
- Event Sourcing: Use event logs to track state changes asynchronously, reducing direct database writes.
- CDN: Use CDN for static content but it has minimal impact on state machine scaling.
- At 10,000 users: ~1,000 orders/sec, ~2,000 state transitions/sec.
- Database must handle ~2,000 writes/sec and ~3,000 reads/sec (including queries).
- Storage: Each order state record ~1 KB, so 1 million orders ~1 GB storage.
- Network bandwidth: Assuming 1 KB per state update, 2,000 updates/sec = ~2 MB/s bandwidth.
- At 1 million users: 100,000 orders/sec, 200,000 state transitions/sec, requiring distributed databases and caching.
Start by explaining the order state machine and its transitions. Then discuss expected load and how it grows with users. Identify the database as the first bottleneck due to frequent writes. Propose caching and sharding to reduce load. Mention horizontal scaling of app servers. Always justify why each solution fits the bottleneck.
Your database handles 1,000 QPS for order state updates. Traffic grows 10x to 10,000 QPS. What do you do first?
Answer: Add read replicas and implement caching to reduce direct database reads. Consider sharding the database to distribute write load. Also, horizontally scale application servers to handle increased requests.