| Users / Events | 100 users | 10K users | 1M users | 100M users |
|---|---|---|---|---|
| Event Volume | ~1K events/sec | ~100K events/sec | ~10M events/sec | ~1B events/sec |
| Event Broker Load | Single broker instance | Cluster of brokers | Multi-region broker clusters | Global distributed brokers with partitioning |
| Consumer Instances | Few consumers per service | Scaled consumers with load balancing | Auto-scaling consumers with partition assignment | Thousands of consumers with sharding and geo-distribution |
| Data Storage | Local or small DB | Partitioned DB or NoSQL | Sharded DB clusters or distributed storage | Multi-cloud distributed storage with archiving |
| Latency | Low (ms) | Low to moderate (ms to 10s ms) | Moderate (10s ms to 100s ms) | Higher latency due to geo-distribution (100s ms) |
Event-driven design in LLD - Scalability & System Analysis
Start learning this pattern below
Jump into concepts and practice - no test required
At small scale, the event broker (message queue) is the first bottleneck because a single broker instance can handle only a limited number of events per second (around 10K-100K). As event volume grows, broker CPU, memory, and network bandwidth limits are reached first.
- Horizontal Scaling: Add more broker instances forming a cluster to distribute event load.
- Partitioning: Split event streams into partitions so consumers can process in parallel.
- Consumer Scaling: Increase number of consumer instances with load balancing and partition assignment.
- Caching: Use caches for frequently accessed event data to reduce storage load.
- Geo-distribution: Deploy brokers and consumers in multiple regions to reduce latency and increase availability.
- Backpressure and Rate Limiting: Control event production rate to avoid overwhelming the system.
For 10K users generating ~100K events/sec:
- Broker cluster needs to handle 100K events/sec, requiring multiple nodes (each ~20-50K events/sec capacity).
- Consumers must scale to process 100K events/sec, possibly 10-20 instances depending on processing time.
- Storage needs depend on event size; for 1KB events, 100K events/sec = ~100MB/sec = ~8.6TB/day.
- Network bandwidth must support event ingress and egress; 1 Gbps link supports ~125MB/sec, so multiple links or cloud bandwidth needed.
Structure your scalability discussion by first identifying the event volume growth, then pinpoint the bottleneck (usually the event broker). Next, explain how to scale horizontally with clusters and partitions, scale consumers, and manage data storage. Mention latency and geo-distribution considerations. Always justify why each step is needed based on system limits.
Your event broker handles 1,000 events per second. Traffic grows 10x to 10,000 events per second. What do you do first?
Answer: Add more broker instances to form a cluster and partition the event streams to distribute load. This prevents the single broker from becoming a bottleneck and allows consumers to scale processing in parallel.
Practice
Solution
Step 1: Understand event-driven design concept
Event-driven design focuses on reacting to events or actions as they occur, rather than processing everything in a fixed sequence.Step 2: Compare options with concept
To allow systems to react to actions as they happen asynchronously matches this idea by describing asynchronous reaction to actions. Other options describe unrelated concepts like sequential processing, data storage, or static content.Final Answer:
To allow systems to react to actions as they happen asynchronously -> Option AQuick Check:
Event-driven design = react asynchronously [OK]
- Confusing event-driven with sequential processing
- Thinking event-driven is about data storage
- Assuming event-driven means static content
Solution
Step 1: Identify roles in event-driven flow
Producers create events, queues hold events, and consumers process events.Step 2: Arrange correct order
The correct order is Producer sends event to Queue, then Consumer reads from Queue.Final Answer:
Producer -> Queue -> Consumer -> Option DQuick Check:
Producer creates, Queue holds, Consumer processes [OK]
- Mixing up producer and consumer order
- Placing queue after consumer
- Ignoring the queue role
event_queue = []
def produce(event):
event_queue.append(event)
def consume():
if event_queue:
return event_queue.pop(0)
return None
produce('A')
produce('B')
print(consume())
print(consume())
print(consume())What is the output?
Solution
Step 1: Trace event production
Two events 'A' and 'B' are added to the queue in order: ['A', 'B'].Step 2: Trace event consumption
consume() removes and returns the first event: first 'A', then 'B', then None when empty.Final Answer:
A B None -> Option CQuick Check:
FIFO queue returns A then B then None [OK]
- Assuming LIFO instead of FIFO
- Forgetting to check empty queue
- Mixing order of events
def consume(event_queue):
event = event_queue.pop()
process(event)What is the main issue with this code?
Solution
Step 1: Analyze pop usage without check
pop() removes last item but no check if queue is empty, risking error.Step 2: Identify error risk
Calling pop() on empty list causes runtime error; code lacks safety check.Final Answer:
It does not check if the queue is empty before popping -> Option AQuick Check:
pop() on empty list causes error [OK]
- Ignoring empty queue check
- Confusing pop() order with error
- Assuming process() is undefined error
Solution
Step 1: Understand scalability and fault tolerance needs
Social media apps have high event volume; parallel processing and fault tolerance are key.Step 2: Evaluate options for scalability
Distributed queues with multiple consumers allow load balancing and fault tolerance. Single consumer limits throughput. Synchronous processing blocks system. Direct send lacks buffering and fault tolerance.Final Answer:
Use a distributed message queue with multiple consumers processing events in parallel -> Option BQuick Check:
Distributed queues + parallel consumers = scalable & fault tolerant [OK]
- Choosing single consumer limits throughput
- Ignoring asynchronous processing benefits
- Skipping queue leads to lost events
