HLDsystem_design~25 mins

Event sourcing in HLD - System Design Exercise

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Design: Event Sourcing System

Design covers event storage, event processing, state rebuilding, and read model projection. Out of scope are UI design and specific business domain logic.

Functional Requirements

FR1: Store all changes to application state as a sequence of events.

FR2: Rebuild current state by replaying events.

FR3: Support querying current state efficiently.

FR4: Allow auditing and debugging by inspecting event history.

FR5: Support event versioning and schema evolution.

FR6: Handle concurrent updates safely.

Non-Functional Requirements

NFR1: System must handle 10,000 events per second.

NFR2: Event replay latency for rebuilding state should be under 5 seconds for 1 million events.

NFR3: Availability target of 99.9% uptime.

NFR4: Event storage must be durable and immutable.

NFR5: Support eventual consistency for read models.

Think Before You Design

Questions to Ask

❓ Question 1

❓ Question 2

❓ Question 3

❓ Question 4

❓ Question 5

❓ Question 6

Key Components

Event Store (append-only log storage)

Command Handler (validates and creates events)

Event Processor (updates read models)

Snapshot Store (to speed up state rebuilding)

Read Model Database (for queries)

Message Broker (for event distribution)

Design Patterns

Event Sourcing pattern

CQRS (Command Query Responsibility Segregation)

Snapshotting

Event Versioning and Upcasting

Idempotent Event Processing

Reference Architecture

Client
  |
  v
Command Handler ---> Event Store ---> Event Processor ---> Read Model DB
                       |                 ^
                       |                 |
                    Snapshot Store ------
                       |
                       v
                 Event Replay

Components

Command Handler

Custom application logic

Receives commands, validates them, and generates new events.

Event Store

Append-only log database (e.g., Apache Kafka, EventStoreDB)

Stores all events immutably in order.

Event Processor

Background worker or stream processor (e.g., Kafka Streams, Apache Flink)

Processes events to update read models and trigger side effects.

Snapshot Store

Key-value store or database (e.g., Redis, Cassandra)

Stores periodic snapshots of state to speed up rebuilding.

Read Model Database

Relational or NoSQL database (e.g., PostgreSQL, MongoDB)

Stores query-optimized views of current state.

Message Broker

Event streaming platform (e.g., Kafka, RabbitMQ)

Distributes events to processors and other services.

Request Flow

1. Client sends a command to the Command Handler.

2. Command Handler validates and creates one or more events.

3. Events are appended to the Event Store in order.

4. Event Processor consumes events from the Event Store or Message Broker.

5. Event Processor updates Read Model Database and optionally creates snapshots.

6. Client queries the Read Model Database for current state.

7. If needed, system rebuilds state by replaying events from Event Store, using snapshots to optimize.

Database Schema

Entities: - Event: {event_id (PK), aggregate_id, event_type, event_data (JSON), timestamp, version} - Snapshot: {snapshot_id (PK), aggregate_id, snapshot_data (JSON), last_event_version, timestamp} - ReadModel: Query-optimized tables depending on domain, updated by event processors. Relationships: - Events belong to an aggregate identified by aggregate_id. - Snapshots correspond to aggregates and represent state at a certain event version.

Scaling Discussion

Bottlenecks

Event Store write throughput limits at very high event rates.

Event replay latency grows with event history size.

Read Model update lag under heavy event load.

Snapshot storage and retrieval overhead.

Handling concurrent commands causing conflicting events.

Solutions

Partition Event Store by aggregate or topic to increase write throughput.

Use snapshotting to reduce event replay time.

Scale Event Processors horizontally with partitioned event streams.

Optimize snapshot frequency balancing storage and replay speed.

Implement optimistic concurrency control and conflict resolution strategies.

Interview Tips

Time: Spend 10 minutes clarifying requirements and constraints, 20 minutes designing components and data flow, 10 minutes discussing scaling and trade-offs, 5 minutes summarizing.

Explain why event sourcing stores state changes as events.

Describe how event replay and snapshots work together.

Discuss how CQRS separates command and query responsibilities.

Highlight handling of event versioning and schema evolution.

Address scaling challenges and concurrency control.