| Scale | Event Volume | Database Load | Message Broker Load | Latency | Complexity |
|---|---|---|---|---|---|
| 100 users | ~10 events/sec | Single DB instance handles writes and outbox inserts easily | Low, broker handles messages without delay | Low latency, near real-time event delivery | Simple polling or trigger-based outbox processing |
| 10K users | ~1K events/sec | DB write load increases; outbox table grows; polling frequency needs tuning | Broker handles moderate load; possible message batching | Latency may increase slightly due to polling intervals | Introduce batching, optimize DB indexes, use connection pooling |
| 1M users | ~100K events/sec | DB becomes bottleneck for writes and outbox reads; outbox table size large | Broker under high load; requires partitioning and scaling | Latency can increase if outbox processing lags | Use DB sharding, outbox table partitioning, asynchronous processing, multiple outbox processors |
| 100M users | ~10M events/sec | Single DB cannot handle load; requires distributed DB or multiple microservices with own DBs | Broker must be highly scalable (e.g., Kafka clusters with partitions) | Latency critical; must minimize delays with parallelism and backpressure handling | Full horizontal scaling, event streaming platforms, advanced monitoring and alerting |
Outbox pattern for reliable events in Microservices - Scalability & System Analysis
The database is the first bottleneck because the outbox pattern relies on writing events reliably to the same database as the business data. As event volume grows, the database write throughput and outbox table scanning for event publishing become heavy, causing increased latency and potential blocking of business transactions.
- Database optimization: Add indexes on outbox table, use partitioning to manage large tables.
- Connection pooling: Efficiently reuse DB connections to handle more concurrent writes.
- Horizontal scaling: Shard the database by user or tenant to distribute load.
- Multiple outbox processors: Run parallel workers to read and publish events faster.
- Asynchronous processing: Decouple event publishing from main transaction commit using background jobs.
- Message broker scaling: Use partitioned, distributed brokers like Kafka to handle high throughput.
- Caching: Cache event metadata if needed to reduce DB reads.
- Monitoring and alerting: Track lag in outbox processing to prevent delays.
- At 1K events/sec, DB must handle ~1K writes/sec plus outbox inserts.
- Outbox table size grows by ~86.4M rows/day (1K events/sec * 86400 sec).
- Assuming 1KB per event row, storage needed ~80GB/day; requires archiving or partitioning.
- Message broker bandwidth depends on event size; 1KB * 1K events/sec = ~1MB/sec (~8Mbps).
- At 100K events/sec, DB and broker bandwidth scale 100x; requires distributed systems.
Start by explaining the outbox pattern and its purpose for reliable event delivery. Then discuss how the database is the first bottleneck as event volume grows. Outline scaling steps: optimize DB, add parallel processors, shard DB, and scale message broker. Emphasize monitoring lag and ensuring eventual consistency. Use clear examples and quantify load to show understanding.
Your database handles 1000 QPS for outbox writes and event reads. Traffic grows 10x to 10,000 QPS. What do you do first and why?
Answer: The first action is to optimize the database by adding indexes and partitioning the outbox table to handle increased writes and reads efficiently. Then introduce multiple parallel outbox processors to publish events faster. This addresses the DB bottleneck before scaling horizontally or upgrading infrastructure.