| Users | Requests per Second | Database Load | Cache Usage | Network Traffic | System Behavior |
|---|---|---|---|---|---|
| 100 users | ~10-50 RPS | Single DB instance handles writes and reads | Minimal caching needed | Low bandwidth | System runs smoothly on a single server |
| 10,000 users | ~1,000 RPS | DB under moderate load, some read replicas needed | Cache frequently accessed data (holds, availability) | Moderate bandwidth, load balancer introduced | Latency may increase, need for caching and replicas |
| 1,000,000 users | ~50,000 RPS | DB write bottleneck, sharding required | Heavy caching, distributed cache cluster | High bandwidth, CDN for static content | Complex coordination for holds, distributed locking |
| 100,000,000 users | ~5,000,000 RPS | Multiple DB clusters, global sharding | Multi-level caching, edge caches | Very high bandwidth, global CDN, message queues | Eventual consistency, asynchronous processing |
Reservation and hold system in LLD - Scalability & System Analysis
Start learning this pattern below
Jump into concepts and practice - no test required
The database is the first bottleneck because reservation and hold systems require strong consistency for writes to avoid double booking. As user requests increase, the DB write throughput limits the system's ability to process holds and reservations in real-time.
- Read Replicas: Offload read queries like availability checks to replicas to reduce DB load.
- Caching: Use distributed caches (e.g., Redis) for frequently accessed data such as seat availability and hold status.
- Sharding: Partition the database by resource (e.g., venue, event) to spread write load across multiple DB instances.
- Horizontal Scaling: Add more application servers behind load balancers to handle increased traffic.
- Distributed Locking: Implement distributed locks or consensus protocols to prevent double booking in a distributed environment.
- Asynchronous Processing: Use message queues for non-critical updates to improve responsiveness.
- CDN: Use CDNs for static content and possibly for caching availability snapshots to reduce load.
- At 10,000 users with ~1,000 RPS, a single DB instance (~5,000 QPS capacity) can handle writes and reads with caching.
- At 1,000,000 users (~50,000 RPS), DB write capacity is exceeded; sharding and multiple DB clusters needed.
- Storage: Each reservation record ~1 KB; 1M reservations = ~1 GB storage, manageable with modern DBs.
- Network bandwidth: 1,000 RPS with ~1 KB payload = ~1 MB/s; scales linearly with users.
- Cache memory: Redis cluster with 10s of GB RAM to hold hot data for fast access.
Start by clarifying system requirements and scale. Identify the critical consistency needs for reservations. Discuss the database as the first bottleneck and propose caching and sharding. Explain how distributed locking prevents double booking. Finally, mention asynchronous processing and CDN use for scalability and performance.
Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?
Answer: Introduce read replicas and caching to offload read queries, then consider sharding the database to distribute write load and prevent bottlenecks.
Practice
Solution
Step 1: Understand the role of a hold
A hold temporarily blocks a resource to prevent others from booking it while the user decides.Step 2: Differentiate hold from reservation
A reservation is permanent until canceled, while a hold expires if not confirmed.Final Answer:
To temporarily block a resource before final booking -> Option DQuick Check:
Hold = Temporary block [OK]
- Confusing hold with permanent reservation
- Thinking holds never expire
- Assuming holds cancel reservations
Solution
Step 1: Identify requirements for hold tracking
We need fast lookup by hold ID and efficient expiration handling.Step 2: Choose data structures
A hash map allows quick hold lookup; a priority queue orders holds by expiration for timely removal.Final Answer:
Hash map with timestamps and a priority queue for expirations -> Option CQuick Check:
Hash map + priority queue = efficient hold tracking [OK]
- Using unordered arrays causing slow expiration checks
- Choosing stack which is LIFO, not suitable for expirations
- Ignoring timestamps in data structure
if hold.exists(hold_id) and not hold.is_expired(hold_id):
reservation.create(hold.resource)
hold.remove(hold_id)
return "Confirmed"
else:
return "Failed"
What will be the output if the hold has expired?Solution
Step 1: Check hold existence and expiration
The code confirms only if hold exists and is not expired.Step 2: Analyze expired hold case
If hold is expired, condition fails and returns "Failed" without creating reservation.Final Answer:
"Failed" -> Option AQuick Check:
Expired hold = "Failed" confirmation [OK]
- Assuming expired holds confirm successfully
- Expecting errors instead of failure message
- Ignoring hold expiration check
for hold in holds:
if hold.expiration_time < current_time:
holds.remove(hold)
What is the main issue with this code?Solution
Step 1: Understand iteration and modification
Removing items from a list while iterating over it causes skipping or runtime errors.Step 2: Identify correct approach
Use a separate list to collect expired holds or iterate over a copy to safely remove.Final Answer:
Modifying a list while iterating causes skipped elements or errors -> Option BQuick Check:
Remove during iteration = skipped elements [OK]
- Ignoring iteration modification side effects
- Assuming expiration comparison is wrong
- Thinking loop type causes the issue
Solution
Step 1: Prevent double booking with distributed locks
Distributed locks ensure only one user can hold a resource at a time across servers.Step 2: Use TTL in distributed cache for hold expiration
TTL automatically expires holds after timeout, preventing indefinite blocking.Step 3: Confirm holds atomically
Atomic transactions guarantee reservation creation without race conditions.Final Answer:
Use distributed locks on resources, store holds with TTL in a distributed cache, and confirm with atomic transactions -> Option AQuick Check:
Distributed locks + TTL + atomic confirm = scalable, safe system [OK]
- Ignoring concurrency causing double booking
- Relying on client-side expiration only
- Not using atomic operations for confirmation
