Design: Event Replay System for Microservices
Design focuses on event capture, storage, and replay mechanisms for microservices. Out of scope are the internal business logic of microservices and UI design.
Functional Requirements
FR1: Capture and store all events generated by microservices in an immutable log
FR2: Allow replaying events from any point in time to rebuild state or recover from failures
FR3: Support replaying events for a single microservice or multiple microservices
FR4: Ensure event ordering is preserved during replay
FR5: Provide APIs to trigger event replay with filters like time range or event type
FR6: Handle high throughput of events (up to 100,000 events per second)
FR7: Ensure minimal impact on live system performance during event capture and replay
Non-Functional Requirements
NFR1: System must handle 100K events per second ingestion
NFR2: Replay latency should be under 5 minutes for up to 1 million events
NFR3: Availability target of 99.9% uptime
NFR4: Event storage must be durable and immutable
NFR5: Replay must guarantee exactly-once processing semantics