0
0
LLDsystem_design~10 mins

Event-driven design in LLD - Scalability & System Analysis

Choose your learning style9 modes available
Scalability Analysis - Event-driven design
Growth Table: Event-driven Design Scaling
Users / Events100 users10K users1M users100M users
Event Volume~1K events/sec~100K events/sec~10M events/sec~1B events/sec
Event Broker LoadSingle broker instanceCluster of brokersMulti-region broker clustersGlobal distributed brokers with partitioning
Consumer InstancesFew consumers per serviceScaled consumers with load balancingAuto-scaling consumers with partition assignmentThousands of consumers with sharding and geo-distribution
Data StorageLocal or small DBPartitioned DB or NoSQLSharded DB clusters or distributed storageMulti-cloud distributed storage with archiving
LatencyLow (ms)Low to moderate (ms to 10s ms)Moderate (10s ms to 100s ms)Higher latency due to geo-distribution (100s ms)
First Bottleneck

At small scale, the event broker (message queue) is the first bottleneck because a single broker instance can handle only a limited number of events per second (around 10K-100K). As event volume grows, broker CPU, memory, and network bandwidth limits are reached first.

Scaling Solutions
  • Horizontal Scaling: Add more broker instances forming a cluster to distribute event load.
  • Partitioning: Split event streams into partitions so consumers can process in parallel.
  • Consumer Scaling: Increase number of consumer instances with load balancing and partition assignment.
  • Caching: Use caches for frequently accessed event data to reduce storage load.
  • Geo-distribution: Deploy brokers and consumers in multiple regions to reduce latency and increase availability.
  • Backpressure and Rate Limiting: Control event production rate to avoid overwhelming the system.
Back-of-Envelope Cost Analysis

For 10K users generating ~100K events/sec:

  • Broker cluster needs to handle 100K events/sec, requiring multiple nodes (each ~20-50K events/sec capacity).
  • Consumers must scale to process 100K events/sec, possibly 10-20 instances depending on processing time.
  • Storage needs depend on event size; for 1KB events, 100K events/sec = ~100MB/sec = ~8.6TB/day.
  • Network bandwidth must support event ingress and egress; 1 Gbps link supports ~125MB/sec, so multiple links or cloud bandwidth needed.
Interview Tip

Structure your scalability discussion by first identifying the event volume growth, then pinpoint the bottleneck (usually the event broker). Next, explain how to scale horizontally with clusters and partitions, scale consumers, and manage data storage. Mention latency and geo-distribution considerations. Always justify why each step is needed based on system limits.

Self Check

Your event broker handles 1,000 events per second. Traffic grows 10x to 10,000 events per second. What do you do first?

Answer: Add more broker instances to form a cluster and partition the event streams to distribute load. This prevents the single broker from becoming a bottleneck and allows consumers to scale processing in parallel.

Key Result
Event-driven design scales by clustering and partitioning event brokers and scaling consumers horizontally. The first bottleneck is the event broker's capacity, fixed by adding broker nodes and partitions.