0
0
Microservicessystem_design~25 mins

Outbox pattern for reliable events in Microservices - System Design Exercise

Choose your learning style9 modes available
Design: Outbox Pattern for Reliable Event Delivery
Design focuses on the microservice implementing the outbox pattern and event delivery mechanism. Event consumers and their processing logic are out of scope.
Functional Requirements
FR1: Ensure events generated by a microservice are reliably delivered to other services.
FR2: Guarantee no events are lost even if the service crashes after database update but before event publishing.
FR3: Support eventual consistency between the service's database and event consumers.
FR4: Allow event consumers to process events asynchronously.
FR5: Handle high throughput of events with minimal latency.
FR6: Provide visibility into event delivery status for monitoring and debugging.
Non-Functional Requirements
NFR1: System must handle 10,000 events per second.
NFR2: Event delivery latency p99 should be under 500ms.
NFR3: System availability target is 99.9% uptime.
NFR4: Events must be delivered at least once (idempotency handled by consumers).
NFR5: Use existing relational database for service data storage.
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
❓ Question 6
Key Components
Service database with outbox table
Transactional write mechanism
Event publisher component
Message broker (e.g., Kafka, RabbitMQ)
Event consumer services
Monitoring and logging tools
Design Patterns
Transactional Outbox Pattern
Event Sourcing
Message Queue
Idempotent Consumer
Retry and Dead Letter Queue
Reference Architecture
 +----------------+       +----------------+       +------------------+
 |                |       |                |       |                  |
 |  Microservice  |       |  Message       |       |  Event Consumers |
 |  +----------+  |       |  Broker        |       |  (Other Services)|
 |  | Database |  |       |  (Kafka/Rabbit)|       |                  |
 |  | +------+ |  |       |                |       |                  |
 |  | |Outbox| |  |       |                |       |                  |
 |  | +------+ |  |       |                |       |                  |
 |  +----------+  |       +----------------+       +------------------+
 +-------|--------+               ^                        ^
         |                        |                        |
         | 1. Write data + event  |                        |
         |    in one DB txn      |                        |
         |---------------------->|                        |
         |                       2. Publish event         |
         |                       from outbox table        |
         |                        |----------------------->|
         |                        |                        |
         |                        |                        |
         |                        |                        |
         | 3. Mark event as sent  |                        |
         |<----------------------|                        |
         |                        |                        |
Components
Service Database
Relational DB (e.g., PostgreSQL)
Stores business data and an outbox table for events in the same transactional context.
Outbox Table
Relational DB table
Holds events generated by the service, pending publishing.
Transactional Write Mechanism
Database transactions
Ensures atomicity of business data changes and event insertion.
Event Publisher
Background worker or scheduler
Reads unsent events from outbox, publishes them to message broker, and marks them sent.
Message Broker
Kafka or RabbitMQ
Decouples event producers and consumers, ensures reliable event delivery.
Event Consumers
Other microservices
Consume and process events asynchronously.
Monitoring and Logging
Prometheus, Grafana, ELK stack
Track event publishing success, failures, and system health.
Request Flow
1. Client sends request to microservice to update data.
2. Microservice starts a database transaction.
3. Microservice updates business data and inserts corresponding event into outbox table within the same transaction.
4. Transaction commits, ensuring both data and event are saved atomically.
5. Event publisher component periodically polls the outbox table for unsent events.
6. Event publisher reads events, publishes them to the message broker.
7. After successful publish, event publisher marks events as sent in the outbox table.
8. Event consumers subscribe to the message broker and process events asynchronously.
9. Monitoring tools track event publishing metrics and alert on failures.
Database Schema
Entities: - BusinessData(id PK, data fields...) - OutboxEvent(id PK, aggregate_id FK, event_type, payload JSON, created_at, sent_at nullable) Relationships: - OutboxEvent.aggregate_id references BusinessData.id Notes: - OutboxEvent stores serialized event data. - sent_at is null until event is published.
Scaling Discussion
Bottlenecks
Outbox table grows large causing slow polling and database performance degradation.
Event publisher becomes a bottleneck under high event throughput.
Message broker saturation or slow consumers causing backpressure.
Database transaction contention due to frequent writes.
Solutions
Implement archiving or purging of sent events from outbox table periodically.
Scale event publisher horizontally with partitioned polling or sharding.
Use a high-throughput, distributed message broker like Kafka with partitioning.
Optimize database indexes and use batch inserts for outbox events.
Apply backpressure handling and consumer scaling to keep up with event load.
Interview Tips
Time: Spend 10 minutes understanding requirements and clarifying assumptions, 20 minutes designing the architecture and data flow, 10 minutes discussing scaling and trade-offs, 5 minutes summarizing.
Explain the problem of atomicity between data changes and event publishing.
Describe how the outbox pattern solves this with a single database transaction.
Discuss the role of the event publisher and message broker.
Mention how eventual consistency is achieved and why at-least-once delivery is acceptable.
Highlight monitoring and retry mechanisms for reliability.
Address scaling challenges and solutions.