Bird
Raised Fist0
HLDsystem_design~25 mins

Event sourcing in HLD - System Design Exercise

Choose your learning style9 modes available
Design: Event Sourcing System
Design covers event storage, event processing, state rebuilding, and read model projection. Out of scope are UI design and specific business domain logic.
Functional Requirements
FR1: Store all changes to application state as a sequence of events.
FR2: Rebuild current state by replaying events.
FR3: Support querying current state efficiently.
FR4: Allow auditing and debugging by inspecting event history.
FR5: Support event versioning and schema evolution.
FR6: Handle concurrent updates safely.
Non-Functional Requirements
NFR1: System must handle 10,000 events per second.
NFR2: Event replay latency for rebuilding state should be under 5 seconds for 1 million events.
NFR3: Availability target of 99.9% uptime.
NFR4: Event storage must be durable and immutable.
NFR5: Support eventual consistency for read models.
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
❓ Question 6
Key Components
Event Store (append-only log storage)
Command Handler (validates and creates events)
Event Processor (updates read models)
Snapshot Store (to speed up state rebuilding)
Read Model Database (for queries)
Message Broker (for event distribution)
Design Patterns
Event Sourcing pattern
CQRS (Command Query Responsibility Segregation)
Snapshotting
Event Versioning and Upcasting
Idempotent Event Processing
Reference Architecture
Client
  |
  v
Command Handler ---> Event Store ---> Event Processor ---> Read Model DB
                       |                 ^
                       |                 |
                    Snapshot Store ------
                       |
                       v
                 Event Replay
Components
Command Handler
Custom application logic
Receives commands, validates them, and generates new events.
Event Store
Append-only log database (e.g., Apache Kafka, EventStoreDB)
Stores all events immutably in order.
Event Processor
Background worker or stream processor (e.g., Kafka Streams, Apache Flink)
Processes events to update read models and trigger side effects.
Snapshot Store
Key-value store or database (e.g., Redis, Cassandra)
Stores periodic snapshots of state to speed up rebuilding.
Read Model Database
Relational or NoSQL database (e.g., PostgreSQL, MongoDB)
Stores query-optimized views of current state.
Message Broker
Event streaming platform (e.g., Kafka, RabbitMQ)
Distributes events to processors and other services.
Request Flow
1. Client sends a command to the Command Handler.
2. Command Handler validates and creates one or more events.
3. Events are appended to the Event Store in order.
4. Event Processor consumes events from the Event Store or Message Broker.
5. Event Processor updates Read Model Database and optionally creates snapshots.
6. Client queries the Read Model Database for current state.
7. If needed, system rebuilds state by replaying events from Event Store, using snapshots to optimize.
Database Schema
Entities: - Event: {event_id (PK), aggregate_id, event_type, event_data (JSON), timestamp, version} - Snapshot: {snapshot_id (PK), aggregate_id, snapshot_data (JSON), last_event_version, timestamp} - ReadModel: Query-optimized tables depending on domain, updated by event processors. Relationships: - Events belong to an aggregate identified by aggregate_id. - Snapshots correspond to aggregates and represent state at a certain event version.
Scaling Discussion
Bottlenecks
Event Store write throughput limits at very high event rates.
Event replay latency grows with event history size.
Read Model update lag under heavy event load.
Snapshot storage and retrieval overhead.
Handling concurrent commands causing conflicting events.
Solutions
Partition Event Store by aggregate or topic to increase write throughput.
Use snapshotting to reduce event replay time.
Scale Event Processors horizontally with partitioned event streams.
Optimize snapshot frequency balancing storage and replay speed.
Implement optimistic concurrency control and conflict resolution strategies.
Interview Tips
Time: Spend 10 minutes clarifying requirements and constraints, 20 minutes designing components and data flow, 10 minutes discussing scaling and trade-offs, 5 minutes summarizing.
Explain why event sourcing stores state changes as events.
Describe how event replay and snapshots work together.
Discuss how CQRS separates command and query responsibilities.
Highlight handling of event versioning and schema evolution.
Address scaling challenges and concurrency control.