Bird
Raised Fist0
Microservicessystem_design~10 mins

Event store concept in Microservices - Scalability & System Analysis

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Scalability Analysis - Event store concept
Growth Table: Event Store Scaling
Users / Events100 Users10K Users1M Users100M Users
Event Volume per Day~10K events~1M events~100M events~10B events
Event Store Size~100 MB~10 GB~1 TB~100 TB+
Write Throughput~100 QPS~10K QPS~1M QPS~100M QPS (distributed)
Read Throughput~100 QPS~10K QPS~1M QPS~100M QPS (distributed)
LatencyLow (ms)Low (ms)Moderate (ms to 10s ms)Higher (tens of ms)
InfrastructureSingle server or small clusterCluster with replicationSharded clusters, partitioned storageGlobal distributed clusters, multi-region
First Bottleneck

The first bottleneck is the event store database write throughput. As events grow, the database struggles to handle the high volume of writes and maintain low latency. This is because event stores append many small writes, which can saturate disk I/O and CPU on a single node.

Scaling Solutions
  • Horizontal scaling: Add more event store nodes and partition events by aggregate or stream ID (sharding) to distribute write load.
  • Write batching: Group multiple events into batches to reduce I/O overhead.
  • Caching: Use in-memory caches for recent events or snapshots to speed up reads.
  • Event snapshots: Periodically store snapshots of aggregate state to reduce replay time.
  • Replication: Use read replicas to scale read throughput and improve availability.
  • Storage tiering: Archive older events to cheaper, slower storage to keep hot storage performant.
  • Use specialized event store databases: Databases optimized for append-only workloads (e.g., Apache Kafka, EventStoreDB) improve performance.
Back-of-Envelope Cost Analysis
  • At 10K users generating 1M events/day, expect ~12 QPS sustained writes (1M / 86400 seconds).
  • At 1M users generating 100M events/day, expect ~1,157 QPS sustained writes.
  • Storage needed grows roughly 1 KB per event, so 100M events ~100 GB per day.
  • Network bandwidth must support event replication and client reads; 1 Gbps network can handle ~125 MB/s, enough for ~125K events/s at 1 KB each.
  • CPU and disk I/O must be provisioned to handle peak bursts, not just average QPS.
Interview Tip

Start by explaining the event store's role as an append-only log of events. Discuss how writes dominate the workload and how latency matters. Then, identify the database write throughput as the first bottleneck. Propose sharding and replication as solutions. Mention caching and snapshots to optimize reads. Finally, consider storage growth and archival strategies. Keep your explanation clear and structured.

Self Check Question

Your event store database handles 1000 QPS writes. Traffic grows 10x to 10,000 QPS. What do you do first and why?

Answer: The first step is to shard the event store by partitioning events across multiple nodes. This distributes the write load so no single database node is overwhelmed, allowing the system to handle 10x more writes without latency spikes.

Key Result
The event store first breaks at database write throughput as event volume grows. Sharding and replication are key to scaling writes and reads efficiently.

Practice

(1/5)
1. What is the primary purpose of an event store in a microservices architecture?
easy
A. To save every change as an immutable event in order
B. To store user credentials securely
C. To cache frequently accessed data for faster reads
D. To manage service discovery and load balancing

Solution

  1. Step 1: Understand event store role

    An event store records all changes as events, preserving order and immutability.
  2. Step 2: Compare options with event store purpose

    Only To save every change as an immutable event in order describes saving changes as immutable events in order, which matches event store's main function.
  3. Final Answer:

    To save every change as an immutable event in order -> Option A
  4. Quick Check:

    Event store = immutable ordered events [OK]
Hint: Event store saves changes as events, not data or cache [OK]
Common Mistakes:
  • Confusing event store with caching layer
  • Thinking event store manages security or load balancing
  • Assuming event store modifies events after saving
2. Which of the following best describes the structure of data in an event store?
easy
A. A mutable key-value store with random access
B. An append-only log of immutable events
C. A relational database with tables and joins
D. A cache with time-to-live expiration

Solution

  1. Step 1: Identify event store data structure

    Event stores keep data as an append-only log where events cannot be changed once stored.
  2. Step 2: Match options to event store structure

    An append-only log of immutable events correctly describes an append-only log of immutable events, unlike mutable stores or caches.
  3. Final Answer:

    An append-only log of immutable events -> Option B
  4. Quick Check:

    Event store = append-only immutable log [OK]
Hint: Event store data is append-only and immutable, not mutable [OK]
Common Mistakes:
  • Thinking event store allows event updates
  • Confusing event store with relational databases
  • Assuming event store is a cache with expiration
3. Given the following sequence of events stored in an event store:
1: UserCreated {userId: 1, name: "Alice"}
2: UserNameUpdated {userId: 1, name: "Alicia"}
3: UserDeleted {userId: 1}

What is the current state of the user with userId=1 after replaying these events?
medium
A. User with name "Alice" and deleted flag true
B. User with name "Alicia" exists
C. User with name "Alice" exists
D. User does not exist

Solution

  1. Step 1: Replay events in order

    First event creates user Alice, second updates name to Alicia, third deletes the user.
  2. Step 2: Determine final user state

    After deletion event, user no longer exists regardless of previous name changes.
  3. Final Answer:

    User does not exist -> Option D
  4. Quick Check:

    Last event is deletion, so user is gone [OK]
Hint: Last event determines existence; deletion means no user [OK]
Common Mistakes:
  • Ignoring the delete event
  • Assuming user name remains after deletion
  • Confusing event replay order
4. You notice that your event store is allowing events to be updated after they are stored. What is the main issue with this behavior?
medium
A. It enables faster event replay by skipping old events
B. It improves performance by reducing storage needs
C. It breaks the immutability principle, causing inconsistent system state
D. It allows easier debugging by fixing event data

Solution

  1. Step 1: Understand immutability in event stores

    Events must be immutable to ensure reliable replay and audit trails.
  2. Step 2: Analyze impact of updating events

    Updating events breaks immutability, leading to inconsistent or incorrect system state.
  3. Final Answer:

    It breaks the immutability principle, causing inconsistent system state -> Option C
  4. Quick Check:

    Event immutability = consistent state [OK]
Hint: Events must never change after storing [OK]
Common Mistakes:
  • Thinking event updates improve debugging
  • Assuming updates improve performance
  • Believing updates speed up replay
5. In a microservices system using an event store, how can you efficiently rebuild the current state of a service that has millions of events without replaying all events every time?
hard
A. Use snapshots to save intermediate states periodically
B. Delete old events after a certain time to reduce replay
C. Store only the latest event per entity to minimize data
D. Replay events in parallel without ordering

Solution

  1. Step 1: Identify replay challenges with many events

    Replaying millions of events is slow and inefficient for rebuilding state.
  2. Step 2: Evaluate solutions to speed up rebuilding

    Snapshots save the state at points in time, allowing replay from snapshot forward, reducing events to process.
  3. Final Answer:

    Use snapshots to save intermediate states periodically -> Option A
  4. Quick Check:

    Snapshots optimize replay by reducing event count [OK]
Hint: Snapshots speed up state rebuild, don't delete events [OK]
Common Mistakes:
  • Deleting old events breaks audit and consistency
  • Storing only latest event loses history
  • Replaying events out of order causes errors