Bird
Raised Fist0
LLDsystem_design~25 mins

Event-driven design in LLD - System Design Exercise

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Design: Event-driven design system
Design focuses on the core event-driven architecture including event producers, event broker, and event consumers. Out of scope are detailed UI design and specific business logic inside consumers.
Functional Requirements
FR1: The system should allow components to communicate by sending and receiving events asynchronously.
FR2: It must support decoupling between event producers and consumers.
FR3: The system should handle high throughput of events, up to 10,000 events per second.
FR4: Events must be processed in near real-time with p99 latency under 200ms.
FR5: Support for event persistence to allow replay and recovery.
FR6: Allow multiple consumers to subscribe to the same event type.
FR7: Ensure reliable delivery of events with at-least-once semantics.
Non-Functional Requirements
NFR1: System availability target is 99.9% uptime (about 8.77 hours downtime per year).
NFR2: The system should scale horizontally to handle increasing event volume.
NFR3: Event ordering is required only per event type but not globally.
NFR4: Use technologies suitable for low-latency asynchronous communication.
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
❓ Question 6
❓ Question 7
Key Components
Event producers (services or modules generating events)
Event broker or message queue (middleware to route events)
Event consumers (services or modules processing events)
Event storage for persistence and replay
Monitoring and alerting components
Design Patterns
Publish-Subscribe pattern
Event Sourcing
Message Queueing
Event Streaming
Dead Letter Queue for failed events
Reference Architecture
 +----------------+       +----------------+       +----------------+
 | Event Producer | ----> | Event Broker   | ----> | Event Consumer |
 +----------------+       +----------------+       +----------------+
          |                        |                        |
          |                        |                        |
          |                        |                        |
          |                        v                        |
          |                +----------------+              |
          |                | Event Storage  | <-------------+
          |                +----------------+              |
          |                        |                        |
          +------------------------+------------------------+
Components
Event Producer
Any service or module generating events
Creates and publishes events to the event broker asynchronously
Event Broker
Apache Kafka / RabbitMQ / AWS SNS+SQS
Receives events from producers, routes them to consumers, and ensures delivery guarantees
Event Consumer
Microservices or modules subscribing to events
Processes events asynchronously and performs business logic
Event Storage
Kafka log storage or a database like Cassandra
Persists events for replay, audit, and recovery
Monitoring & Alerting
Prometheus + Grafana or CloudWatch
Tracks system health, event throughput, and failures
Request Flow
1. 1. Event Producer creates an event and publishes it to the Event Broker.
2. 2. Event Broker receives the event and stores it in Event Storage for durability.
3. 3. Event Broker routes the event asynchronously to all subscribed Event Consumers.
4. 4. Event Consumers receive the event and process it independently.
5. 5. If processing fails, the event is sent to a Dead Letter Queue for later inspection.
6. 6. Monitoring components track event flow and system health continuously.
Database Schema
Entities: - Event: {event_id (PK), event_type, payload, timestamp, status} - Subscription: {subscription_id (PK), consumer_id, event_type} - Consumer: {consumer_id (PK), name, endpoint} Relationships: - One Event can be delivered to many Consumers via Subscriptions (1:N) - Event Storage holds all Events for replay and audit
Scaling Discussion
Bottlenecks
Event Broker throughput limits when event volume grows beyond capacity.
Event Storage size and read/write performance under heavy load.
Consumer processing speed causing backpressure.
Network latency affecting event delivery speed.
Single point of failure in Event Broker or Storage.
Solutions
Scale Event Broker horizontally by partitioning topics and adding brokers.
Use distributed, scalable storage like Kafka logs or Cassandra for event persistence.
Implement consumer scaling with multiple instances and load balancing.
Use efficient serialization and compression to reduce network overhead.
Deploy redundant brokers and storage clusters with failover mechanisms.
Interview Tips
Time: Spend 10 minutes understanding requirements and clarifying questions, 20 minutes designing the architecture and data flow, 10 minutes discussing scaling and trade-offs, and 5 minutes summarizing.
Explain how event-driven design decouples components for flexibility and scalability.
Describe the role of the event broker and how it ensures reliable delivery.
Discuss event persistence for replay and fault tolerance.
Highlight how consumers can scale independently and handle failures.
Mention trade-offs like eventual consistency and ordering guarantees.
Show awareness of bottlenecks and practical scaling strategies.

Practice

(1/5)
1. What is the main purpose of event-driven design in system architecture?
easy
A. To allow systems to react to actions as they happen asynchronously
B. To process all tasks sequentially in a fixed order
C. To store data permanently in a database
D. To create static web pages without user interaction

Solution

  1. Step 1: Understand event-driven design concept

    Event-driven design focuses on reacting to events or actions as they occur, rather than processing everything in a fixed sequence.
  2. Step 2: Compare options with concept

    To allow systems to react to actions as they happen asynchronously matches this idea by describing asynchronous reaction to actions. Other options describe unrelated concepts like sequential processing, data storage, or static content.
  3. Final Answer:

    To allow systems to react to actions as they happen asynchronously -> Option A
  4. Quick Check:

    Event-driven design = react asynchronously [OK]
Hint: Event-driven means reacting to events as they happen [OK]
Common Mistakes:
  • Confusing event-driven with sequential processing
  • Thinking event-driven is about data storage
  • Assuming event-driven means static content
2. Which of the following is the correct sequence in an event-driven system?
easy
A. Consumer -> Producer -> Queue
B. Producer -> Consumer -> Queue
C. Queue -> Producer -> Consumer
D. Producer -> Queue -> Consumer

Solution

  1. Step 1: Identify roles in event-driven flow

    Producers create events, queues hold events, and consumers process events.
  2. Step 2: Arrange correct order

    The correct order is Producer sends event to Queue, then Consumer reads from Queue.
  3. Final Answer:

    Producer -> Queue -> Consumer -> Option D
  4. Quick Check:

    Producer creates, Queue holds, Consumer processes [OK]
Hint: Events flow: Producer to Queue to Consumer [OK]
Common Mistakes:
  • Mixing up producer and consumer order
  • Placing queue after consumer
  • Ignoring the queue role
3. Consider this simplified event-driven code snippet:
event_queue = []

def produce(event):
    event_queue.append(event)

def consume():
    if event_queue:
        return event_queue.pop(0)
    return None

produce('A')
produce('B')
print(consume())
print(consume())
print(consume())

What is the output?
medium
A. None None None
B. B A None
C. A B None
D. A None B

Solution

  1. Step 1: Trace event production

    Two events 'A' and 'B' are added to the queue in order: ['A', 'B'].
  2. Step 2: Trace event consumption

    consume() removes and returns the first event: first 'A', then 'B', then None when empty.
  3. Final Answer:

    A B None -> Option C
  4. Quick Check:

    FIFO queue returns A then B then None [OK]
Hint: Queue pops first-in event first (FIFO) [OK]
Common Mistakes:
  • Assuming LIFO instead of FIFO
  • Forgetting to check empty queue
  • Mixing order of events
4. In an event-driven system, a developer wrote this code snippet:
def consume(event_queue):
    event = event_queue.pop()
    process(event)

What is the main issue with this code?
medium
A. It does not check if the queue is empty before popping
B. It adds events instead of removing them
C. It uses an undefined function 'process'
D. It processes events in reverse order, not FIFO

Solution

  1. Step 1: Analyze pop usage without check

    pop() removes last item but no check if queue is empty, risking error.
  2. Step 2: Identify error risk

    Calling pop() on empty list causes runtime error; code lacks safety check.
  3. Final Answer:

    It does not check if the queue is empty before popping -> Option A
  4. Quick Check:

    pop() on empty list causes error [OK]
Hint: Always check queue not empty before pop() [OK]
Common Mistakes:
  • Ignoring empty queue check
  • Confusing pop() order with error
  • Assuming process() is undefined error
5. You are designing a scalable event-driven system for a social media app. Which approach best improves scalability and fault tolerance?
hard
A. Store all events in a database and process them synchronously
B. Use a distributed message queue with multiple consumers processing events in parallel
C. Use a single queue and one consumer to ensure event order
D. Send events directly from producer to consumer without queue

Solution

  1. Step 1: Understand scalability and fault tolerance needs

    Social media apps have high event volume; parallel processing and fault tolerance are key.
  2. Step 2: Evaluate options for scalability

    Distributed queues with multiple consumers allow load balancing and fault tolerance. Single consumer limits throughput. Synchronous processing blocks system. Direct send lacks buffering and fault tolerance.
  3. Final Answer:

    Use a distributed message queue with multiple consumers processing events in parallel -> Option B
  4. Quick Check:

    Distributed queues + parallel consumers = scalable & fault tolerant [OK]
Hint: Parallel consumers on distributed queue scale best [OK]
Common Mistakes:
  • Choosing single consumer limits throughput
  • Ignoring asynchronous processing benefits
  • Skipping queue leads to lost events