| Users | Requests per Second (RPS) | Gateway Load | Auth Service Load | Latency Impact | Scaling Needs |
|---|---|---|---|---|---|
| 100 users | ~10 RPS | Low, single gateway instance | Low, single auth instance | Negligible | Basic setup, no scaling needed |
| 10,000 users | ~1,000 RPS | Moderate, gateway CPU & memory increase | Moderate, auth service CPU & DB queries increase | Small latency increase possible | Start load balancing gateways, caching tokens |
| 1,000,000 users | ~100,000 RPS | High, multiple gateway instances behind LB | High, auth DB and token validation bottleneck | Noticeable latency if no caching | Horizontal scaling, token caching, DB replicas |
| 100,000,000 users | ~10,000,000 RPS | Very high, global distributed gateways | Very high, sharded auth DB, distributed cache | Latency critical, must optimize | Global load balancing, CDN for static tokens, sharding, microservice partitioning |
Authentication at gateway level in Microservices - Scalability & System Analysis
The authentication database and token validation service become the first bottleneck as user requests grow. This is because every request at the gateway requires token verification, which involves database lookups or cryptographic operations. The gateway itself can scale horizontally, but the auth service and its database can get overwhelmed by high QPS.
- Horizontal Scaling: Add more gateway instances behind a load balancer to handle more concurrent connections.
- Token Caching: Cache validated tokens in a fast in-memory store (e.g., Redis) to reduce DB hits.
- Read Replicas: Use read replicas for the authentication database to spread read load.
- Stateless Tokens: Use JWT or similar tokens that can be validated without DB calls.
- Sharding: Partition the authentication data by user segments to reduce DB contention.
- Global Distribution: Deploy gateways and caches close to users to reduce latency.
- Rate Limiting: Protect the auth service from overload by limiting requests per user/IP.
Assuming 1 million users generate 100,000 RPS:
- Each gateway server handles ~5,000 RPS → Need ~20 gateway servers.
- Auth DB handles ~10,000 QPS max → Need at least 10 read replicas or use stateless tokens.
- Token cache (Redis) handles ~100,000 ops/sec → Single Redis cluster can suffice.
- Network bandwidth: 100,000 RPS x 1 KB/request ≈ 100 MB/s (~800 Mbps), requires 1 Gbps network links.
- Storage: Auth DB stores user credentials and tokens, grows with user base; sharding helps manage size.
Start by explaining the authentication flow at the gateway. Identify the bottleneck (auth DB and token validation). Discuss scaling the gateway horizontally first, then caching tokens to reduce DB load. Mention stateless tokens to avoid DB calls. Finally, talk about global distribution and rate limiting to handle very large scale.
Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?
Answer: Implement token caching or switch to stateless tokens (like JWT) to reduce DB queries. Also, add read replicas to distribute DB read load. This reduces pressure on the database before scaling hardware.