Bird
Raised Fist0
HLDsystem_design~10 mins

Inventory management in HLD - Scalability & System Analysis

Choose your learning style9 modes available
Scalability Analysis - Inventory management
Growth Table: Inventory Management System
ScaleUsersInventory ItemsRequests per Second (RPS)Data StorageKey Changes
Small10010,000501 GBSingle server, monolithic DB, simple caching
Medium10,0001,000,0005,000100 GBDB read replicas, app server scaling, caching layer
Large1,000,000100,000,000500,00010 TBDB sharding, distributed cache, load balancers, async processing
Very Large100,000,00010,000,000,00050,000,0001 PB+Multi-region deployment, CDN, advanced partitioning, event-driven architecture
First Bottleneck

At small scale, the database is the first bottleneck because it handles all inventory reads and writes. As users and inventory grow, the single database server struggles with query load and data size.

At medium scale, the application servers can become CPU and memory bottlenecks due to increased request processing and business logic.

At large scale, network bandwidth and data partitioning become critical as data volume and traffic grow beyond single data center limits.

Scaling Solutions
  • Database Scaling: Use read replicas to distribute read traffic. Implement sharding to split data by inventory categories or regions.
  • Caching: Add a distributed cache (e.g., Redis) to store frequently accessed inventory data and reduce DB load.
  • Application Scaling: Horizontally scale app servers behind load balancers to handle more concurrent users.
  • Async Processing: Use message queues for inventory updates to smooth spikes and improve responsiveness.
  • Network & Storage: Use CDNs for static content and multi-region deployments to reduce latency and bandwidth bottlenecks.
Back-of-Envelope Cost Analysis
  • At 10,000 users with 5,000 RPS, expect ~100 GB storage for inventory data and metadata.
  • Each server handles ~3,000 concurrent connections; thus, 2 app servers needed at this scale.
  • Database can handle ~10,000 QPS with read replicas; write QPS is lower, so write scaling needed.
  • Network bandwidth: 1 Gbps (~125 MB/s) can support ~10,000 RPS if payloads are small (~10 KB).
  • Cache memory sizing depends on hot data size; 10-20 GB Redis cluster typical at medium scale.
Interview Tip

Start by defining the scale and key metrics (users, requests, data size). Identify the first bottleneck logically (usually DB). Then discuss scaling strategies step-by-step: caching, read replicas, sharding, app scaling, and network optimizations. Always justify why each solution fits the bottleneck.

Self Check Question

Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first and why?

Answer: Add read replicas to distribute read queries and reduce load on the primary database. This is the fastest way to scale read capacity without major redesign.

Key Result
Inventory management systems first hit database bottlenecks as users and data grow; scaling requires caching, read replicas, and sharding combined with app server scaling and network optimizations.