| Scale | Users | Reviews per day | Storage | Traffic | System Changes |
|---|---|---|---|---|---|
| Small | 100 | 500 | ~10 MB | Low | Single app server, single DB instance |
| Medium | 10,000 | 50,000 | ~1 GB | Moderate | DB read replicas, caching, load balancer |
| Large | 1,000,000 | 5,000,000 | ~100 GB | High | Sharded DB, CDN for images, distributed cache |
| Very Large | 100,000,000 | 500,000,000 | ~10 TB | Very High | Multi-region deployment, microservices, advanced sharding |
Rating and review system in LLD - Scalability & System Analysis
Start learning this pattern below
Jump into concepts and practice - no test required
At small to medium scale, the database is the first bottleneck because it must handle many read and write queries for reviews and ratings. Writes increase with new reviews, and reads increase with users fetching reviews. The DB CPU and disk I/O limit throughput.
- Database Read Replicas: Offload read queries to replicas to reduce load on primary DB.
- Caching: Use in-memory caches (e.g., Redis) for frequently read data like average ratings.
- Horizontal Scaling: Add more application servers behind a load balancer to handle more user requests.
- Sharding: Partition the database by product or user ID to distribute write and read load.
- CDN: Serve review images and static content via CDN to reduce bandwidth and latency.
- Asynchronous Processing: Use message queues to handle heavy write operations asynchronously.
- Requests per second (RPS): At 1M users, assuming 5M reviews/day -> ~60 reviews/sec write + ~6000 reads/sec (assuming 100 reads per review) = ~6060 RPS total.
- Storage: Average review size ~2 KB (text + metadata). 5M reviews/day -> 10 GB/day. For 10 days retention -> ~100 GB storage.
- Bandwidth: Assuming 100 KB per review fetch (including images), 6000 reads/sec -> ~600 MB/s (~4.8 Gbps). Requires CDN and network scaling.
Start by clarifying scale and usage patterns. Identify bottlenecks step-by-step: database, app servers, network. Propose solutions matching bottlenecks: caching for reads, sharding for writes, CDN for media. Discuss trade-offs and monitoring strategies.
Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?
Answer: Add read replicas to offload read queries and implement caching to reduce DB load before considering sharding or adding more app servers.
Practice
Solution
Step 1: Understand the system's goal
A rating and review system is designed to gather user opinions and ratings about products.Step 2: Identify the main function
It calculates average ratings to help other users make decisions quickly.Final Answer:
To collect user feedback and calculate average product ratings -> Option BQuick Check:
Rating system = Collect feedback + average rating [OK]
- Confusing rating system with payment or inventory systems
- Thinking it manages user credentials
- Assuming it handles shipping or delivery
Solution
Step 1: Consider lookup efficiency
Quick lookup by product ID requires a data structure with fast key-based access.Step 2: Choose appropriate structure
A hash map (dictionary) allows O(1) average time to find reviews by product ID.Final Answer:
Hash map with product ID as key and list of reviews as value -> Option AQuick Check:
Fast lookup = Hash map [OK]
- Using arrays without indexing causes slow searches
- Linked lists have O(n) lookup time
- Stacks do not support direct lookup by key
current_avg = 4.0
num_reviews = 5
new_rating = 5
new_avg = (current_avg * num_reviews + new_rating) / (num_reviews + 1)
What is the value of
new_avg?Solution
Step 1: Calculate total rating sum before new review
Total sum = current_avg * num_reviews = 4.0 * 5 = 20Step 2: Add new rating and compute new average
New sum = 20 + 5 = 25
New average = 25 / (5 + 1) = 25 / 6 ≈ 4.1667Final Answer:
4.17 -> Option AQuick Check:
Average update formula ≈ 4.17 [OK]
- Forgetting to add new rating to total sum
- Dividing by old count instead of count+1
- Rounding too early causing wrong average
Solution
Step 1: Understand average calculation
Average = sum of ratings / count of reviews. Both must be accurate.Step 2: Identify deletion impact
If count is not decreased after deleting a review, average calculation divides by wrong count.Final Answer:
Not updating the count of reviews after deletion -> Option CQuick Check:
Count mismatch causes wrong average [OK]
- Ignoring count update after deletion
- Assuming recalculation always fixes average
- Confusing data structure choice with calculation error
Solution
Step 1: Consider query and update load
Millions of products and users mean many queries and updates.Step 2: Choose efficient strategy
Precomputing average and count and updating them incrementally avoids scanning all reviews each time.Step 3: Evaluate other options
Computing average on each query is slow; no indexes cause slow lookups; caching only latest review misses full rating info.Final Answer:
Maintain precomputed average and count, update incrementally on review changes -> Option DQuick Check:
Precompute + incremental update = scalable [OK]
- Recomputing averages on every query
- Ignoring indexing and caching strategies
- Caching incomplete data causing stale info
