| Scale | Users | Inventory Items | Requests per Second (RPS) | Data Storage | Key Changes |
|---|---|---|---|---|---|
| Small | 100 | 10,000 | 50 | 1 GB | Single server, monolithic DB, simple caching |
| Medium | 10,000 | 1,000,000 | 5,000 | 100 GB | DB read replicas, app server scaling, caching layer |
| Large | 1,000,000 | 100,000,000 | 500,000 | 10 TB | DB sharding, distributed cache, load balancers, async processing |
| Very Large | 100,000,000 | 10,000,000,000 | 50,000,000 | 1 PB+ | Multi-region deployment, CDN, advanced partitioning, event-driven architecture |
Inventory management in HLD - Scalability & System Analysis
At small scale, the database is the first bottleneck because it handles all inventory reads and writes. As users and inventory grow, the single database server struggles with query load and data size.
At medium scale, the application servers can become CPU and memory bottlenecks due to increased request processing and business logic.
At large scale, network bandwidth and data partitioning become critical as data volume and traffic grow beyond single data center limits.
- Database Scaling: Use read replicas to distribute read traffic. Implement sharding to split data by inventory categories or regions.
- Caching: Add a distributed cache (e.g., Redis) to store frequently accessed inventory data and reduce DB load.
- Application Scaling: Horizontally scale app servers behind load balancers to handle more concurrent users.
- Async Processing: Use message queues for inventory updates to smooth spikes and improve responsiveness.
- Network & Storage: Use CDNs for static content and multi-region deployments to reduce latency and bandwidth bottlenecks.
- At 10,000 users with 5,000 RPS, expect ~100 GB storage for inventory data and metadata.
- Each server handles ~3,000 concurrent connections; thus, 2 app servers needed at this scale.
- Database can handle ~10,000 QPS with read replicas; write QPS is lower, so write scaling needed.
- Network bandwidth: 1 Gbps (~125 MB/s) can support ~10,000 RPS if payloads are small (~10 KB).
- Cache memory sizing depends on hot data size; 10-20 GB Redis cluster typical at medium scale.
Start by defining the scale and key metrics (users, requests, data size). Identify the first bottleneck logically (usually DB). Then discuss scaling strategies step-by-step: caching, read replicas, sharding, app scaling, and network optimizations. Always justify why each solution fits the bottleneck.
Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first and why?
Answer: Add read replicas to distribute read queries and reduce load on the primary database. This is the fastest way to scale read capacity without major redesign.
