| Users | Monolith | Microservices |
|---|---|---|
| 100 users | Single app server handles all logic; simple deployment | Multiple small services; overhead of inter-service calls |
| 10K users | App server CPU/memory stressed; DB load increases; deployment slower | Services can be scaled independently; network latency starts to matter |
| 1M users | Single DB becomes bottleneck; app server overloaded; hard to deploy fast | Services scaled horizontally; DB sharding or replicas per service; complex orchestration |
| 100M users | Monolith likely fails; scaling limited; high downtime risk | Highly scalable; services distributed globally; complex monitoring and tracing needed |
Monolith vs microservices comparison - Scaling Approaches Compared
For monoliths, the database is usually the first bottleneck because all logic and data access happen in one place, causing heavy load and contention.
For microservices, the network and inter-service communication become bottlenecks as the number of services and calls grow, increasing latency and complexity.
- Monolith: Vertical scaling (bigger servers), database read replicas, caching layers, and eventually splitting into microservices.
- Microservices: Horizontal scaling of individual services, database sharding per service, asynchronous messaging, API gateways, and service mesh for communication management.
At 1M users, assuming 1 request per second per user:
- Requests per second: 1,000,000
- Monolith: Requires very powerful servers and large DB clusters; high risk of downtime.
- Microservices: Distributed load across many smaller servers; network bandwidth and inter-service calls increase costs.
- Storage: Microservices may duplicate some data per service, increasing storage needs.
Start by explaining the differences in architecture and scaling points. Discuss bottlenecks clearly for each approach. Then propose specific scaling solutions matching those bottlenecks. Use real numbers to show understanding of limits.
Your database handles 1000 QPS. Traffic grows 10x. What do you do first?
Answer: Add read replicas to distribute read load and implement caching to reduce DB hits before considering more complex solutions.