| Scale | Users | Events per Second | Storage Growth | System Changes |
|---|---|---|---|---|
| Small | 100 users | ~10-100 EPS | Few MBs per day | Single event store instance, simple replay |
| Medium | 10,000 users | ~1,000-5,000 EPS | GBs per day | Event store clustering, read model caching, snapshotting |
| Large | 1,000,000 users | ~50,000-100,000 EPS | TBs per month | Event store sharding, asynchronous projections, CQRS separation |
| Very Large | 100,000,000 users | Millions EPS | Petabytes per year | Multi-region event stores, advanced partitioning, archival, strong consistency trade-offs |
Event sourcing pattern in Microservices - Scalability & System Analysis
The event store database is the first bottleneck. It must handle high write throughput and fast reads for event replay and projections. As event volume grows, storage I/O and query latency increase, slowing down the system.
- Horizontal scaling: Add more event store nodes with clustering and partitioning (sharding) by aggregate or event type.
- Snapshotting: Periodically save aggregate state snapshots to reduce replay time.
- Read model caching: Use CQRS to separate read models and cache them for fast queries.
- Asynchronous projections: Build read models asynchronously to reduce write path latency.
- Archival: Move old events to cheaper storage to keep active event store performant.
- Multi-region deployment: Distribute event stores geographically to reduce latency and increase availability.
At 10,000 EPS, event store needs to handle ~10K writes/sec. A single node can handle ~5K QPS, so at least 3 nodes are needed for redundancy and load.
Storage: Assuming 1 KB per event, 10K EPS means ~864 MB/day. Requires scalable storage solutions.
Bandwidth: 10K EPS * 1 KB = ~10 MB/s write bandwidth, plus read traffic for projections.
Start by explaining event sourcing basics. Then discuss how event volume grows with users. Identify the event store as the bottleneck. Propose scaling with sharding, snapshotting, and CQRS. Mention trade-offs like eventual consistency and complexity.
Your event store database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?
Answer: Add event store nodes and shard events by aggregate or event type to distribute load. Also implement snapshotting to reduce replay overhead.