0
0
Microservicessystem_design~10 mins

Event sourcing pattern in Microservices - Scalability & System Analysis

Choose your learning style9 modes available
Scalability Analysis - Event sourcing pattern
Growth Table: Event Sourcing Pattern
ScaleUsersEvents per SecondStorage GrowthSystem Changes
Small100 users~10-100 EPSFew MBs per daySingle event store instance, simple replay
Medium10,000 users~1,000-5,000 EPSGBs per dayEvent store clustering, read model caching, snapshotting
Large1,000,000 users~50,000-100,000 EPSTBs per monthEvent store sharding, asynchronous projections, CQRS separation
Very Large100,000,000 usersMillions EPSPetabytes per yearMulti-region event stores, advanced partitioning, archival, strong consistency trade-offs
First Bottleneck

The event store database is the first bottleneck. It must handle high write throughput and fast reads for event replay and projections. As event volume grows, storage I/O and query latency increase, slowing down the system.

Scaling Solutions
  • Horizontal scaling: Add more event store nodes with clustering and partitioning (sharding) by aggregate or event type.
  • Snapshotting: Periodically save aggregate state snapshots to reduce replay time.
  • Read model caching: Use CQRS to separate read models and cache them for fast queries.
  • Asynchronous projections: Build read models asynchronously to reduce write path latency.
  • Archival: Move old events to cheaper storage to keep active event store performant.
  • Multi-region deployment: Distribute event stores geographically to reduce latency and increase availability.
Back-of-Envelope Cost Analysis

At 10,000 EPS, event store needs to handle ~10K writes/sec. A single node can handle ~5K QPS, so at least 3 nodes are needed for redundancy and load.

Storage: Assuming 1 KB per event, 10K EPS means ~864 MB/day. Requires scalable storage solutions.

Bandwidth: 10K EPS * 1 KB = ~10 MB/s write bandwidth, plus read traffic for projections.

Interview Tip

Start by explaining event sourcing basics. Then discuss how event volume grows with users. Identify the event store as the bottleneck. Propose scaling with sharding, snapshotting, and CQRS. Mention trade-offs like eventual consistency and complexity.

Self Check

Your event store database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?

Answer: Add event store nodes and shard events by aggregate or event type to distribute load. Also implement snapshotting to reduce replay overhead.

Key Result
Event sourcing scales well with proper event store partitioning, snapshotting, and CQRS, but the event store database is the first bottleneck as event volume grows.