| Users/Traffic | What Changes? |
|---|---|
| 100 users | Simple async messaging between services; low message volume; eventual consistency delays are minimal and unnoticeable. |
| 10,000 users | Message queues grow; need for retry and dead-letter queues; monitoring of message lag; some delays in data sync become visible. |
| 1,000,000 users | High message throughput; message brokers become bottleneck; need partitioning and scaling of queues; conflict resolution logic needed for data divergence. |
| 100,000,000 users | Massive distributed messaging; multi-region replication; complex conflict resolution; eventual consistency delays impact user experience; advanced monitoring and alerting required. |
Eventual consistency handling in Microservices - Scalability & System Analysis
The message broker or event queue becomes the first bottleneck as message volume grows. It can get overwhelmed by high throughput, causing delays and message backlogs. This slows down data synchronization between microservices, increasing eventual consistency delays.
- Horizontal scaling: Add more instances of message brokers and partition topics to distribute load.
- Sharding: Partition data and messages by key to reduce contention and improve parallelism.
- Caching: Use caches to serve read requests quickly while waiting for eventual consistency.
- Idempotency and retries: Implement idempotent consumers and retry mechanisms to handle failures gracefully.
- Conflict resolution: Use versioning, timestamps, or CRDTs (Conflict-free Replicated Data Types) to resolve data conflicts.
- Monitoring and alerting: Track message lag, queue sizes, and processing times to detect bottlenecks early.
- Multi-region replication: Deploy brokers and services closer to users to reduce latency.
- At 1M users, assume 10 requests per user per minute = ~166,000 requests/sec.
- Each request may generate 1-3 messages; message broker must handle ~500,000 messages/sec.
- Single Kafka broker can handle ~100,000 messages/sec; need ~5 brokers with partitioning.
- Storage for event logs grows rapidly; plan for terabytes per day depending on message size.
- Network bandwidth must support message replication; 1 Gbps link ~125 MB/s; plan multiple links or cloud bandwidth.
Start by explaining what eventual consistency means in microservices. Then identify the main bottleneck (message broker). Discuss how message volume grows with users and how that affects latency. Propose scaling solutions like partitioning, retries, and conflict resolution. Finally, mention monitoring and user experience trade-offs.
Your database handles 1000 QPS. Traffic grows 10x. What do you do first?
Answer: Since traffic increased to 10,000 QPS, the database is likely the bottleneck. First, add read replicas to distribute read load and implement caching to reduce direct database queries. Also, consider optimizing queries and connection pooling before scaling vertically or sharding.