| Users / Orders | 100 Orders/day | 10,000 Orders/day | 1,000,000 Orders/day | 100,000,000 Orders/day |
|---|---|---|---|---|
| Order State Transitions | Simple DB updates, single instance | Increased DB writes, possible queueing | High DB load, need async processing | Massive scale, distributed state management |
| System Components | Single app server, monolithic state logic | Multiple app servers, load balancer | Microservices for order states, event-driven | Global distributed services, CQRS, event sourcing |
| Database | Single relational DB instance | Read replicas, connection pooling | Sharding, partitioning by order ID or region | Multi-region DB clusters, eventual consistency |
| Message Queues | Not required or simple queue | Basic queues for async state changes | Robust event queues, retry mechanisms | Distributed event streaming platforms (Kafka, Pulsar) |
| Latency | Low, synchronous updates | Moderate, some async processing | Higher, eventual consistency accepted | Latency optimized with caching and event sourcing |
Order state machine in LLD - Scalability & System Analysis
The database becomes the first bottleneck as order volume grows. Each order state change requires a write operation. At around 10,000 orders per day, the DB write load increases significantly, causing slower response times and potential contention.
- Read Replicas: Offload read queries to replicas to reduce DB load.
- Connection Pooling: Efficiently manage DB connections to handle more concurrent requests.
- Asynchronous Processing: Use message queues to decouple state changes from user requests.
- Sharding: Partition the database by order ID or region to distribute load.
- Event Sourcing: Store state changes as events to improve scalability and auditability.
- Microservices: Separate order state logic into dedicated services for better scaling.
- CDN and Caching: Cache order status responses where possible to reduce DB hits.
Assuming 1,000,000 orders/day (~11.6 orders/sec):
- DB writes: ~12 QPS (writes per second) for state changes.
- DB reads: Assuming 10 reads per order, ~120 QPS reads.
- Storage: Each order state event ~1 KB, daily ~1 GB storage needed.
- Network bandwidth: Assuming 10 KB per order state API call, ~116 KB/s (~0.9 Mbps).
- Server capacity: One app server can handle ~1000 concurrent connections; multiple servers needed for load balancing.
Start by describing the order state machine and its transitions. Then discuss expected load and identify the first bottleneck (usually the database). Next, explain scaling strategies like asynchronous processing and sharding. Finally, mention trade-offs such as consistency vs latency and how event sourcing can help.
Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?
Answer: Introduce read replicas and connection pooling to distribute load and reduce contention. Also, implement asynchronous processing with message queues to decouple writes from user requests, preventing DB overload.