| Users | System Changes |
|---|---|
| 100 users | Direct service calls; simple routing; minimal latency; no gateway needed |
| 10,000 users | Multiple microservices; need unified access; increased request volume; API gateway introduced for routing and security |
| 1,000,000 users | High concurrency; gateway handles load balancing, authentication, rate limiting; caching added; gateway scales horizontally |
| 100,000,000 users | Global distribution; multiple API gateway clusters; CDN integration; advanced traffic shaping; microservices sharded; gateway handles failover and analytics |
Why API gateways unify service access in Microservices - Scalability Evidence
As user requests grow, the API gateway becomes the first bottleneck because it handles all incoming traffic to multiple microservices. It must route, authenticate, and apply policies for every request. Without scaling, the gateway's CPU, memory, or network bandwidth limits will cause increased latency and dropped requests.
- Horizontal Scaling: Add more gateway instances behind a load balancer to distribute traffic.
- Caching: Cache common responses at the gateway to reduce backend calls.
- Rate Limiting: Protect backend services by limiting requests per user or IP.
- Edge Deployment: Deploy gateways closer to users (regional clusters) to reduce latency.
- Offload SSL/TLS: Terminate encryption at the gateway to reduce backend load.
- Use CDN: For static content, reduce gateway load by serving from CDN.
- At 1M users, assuming 1 request per second each = 1M RPS total.
- One gateway instance handles ~5,000 RPS → need ~200 instances.
- Network bandwidth per gateway: 1 Gbps (~125 MB/s) can handle ~10,000 requests of 10 KB each per second.
- Storage at gateway is minimal (caching few GBs), but backend storage grows with data.
- Cost scales with number of gateway instances, bandwidth, and caching infrastructure.
Start by explaining the role of the API gateway in unifying access. Discuss how traffic growth impacts the gateway first. Identify bottlenecks like CPU, memory, and network. Propose scaling solutions like horizontal scaling and caching. Mention trade-offs and monitoring needs. Use clear examples and numbers to support your points.
Your API gateway handles 1,000 requests per second. Traffic grows 10x to 10,000 RPS. What do you do first and why?