HLDsystem_design~10 mins

API gateway concept in HLD - Scalability & System Analysis

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Scalability Analysis - API gateway concept

Growth Table: API Gateway Scaling

Users/Traffic	API Gateway Load	Latency Impact	Security & Routing	Infrastructure Changes
100 users	Low requests per second (RPS), single instance handles well	Minimal latency, simple routing	Basic authentication and rate limiting	Single server or cloud function
10,000 users	Moderate RPS, single instance may start to saturate	Latency slightly increases, need optimized routing	Enhanced security policies, throttling	Load balancer added, possible multiple instances
1 million users	High RPS, single instance insufficient	Latency sensitive, need caching and optimized paths	Advanced security (OAuth, JWT), API versioning	Horizontal scaling, distributed gateway cluster
100 million users	Very high RPS, requires global distribution	Latency critical, edge caching and CDN integration	Multi-tenant security, dynamic routing, throttling	Global load balancers, multi-region clusters, CDN

First Bottleneck

The API gateway server CPU and memory become the first bottleneck as traffic grows. It must handle all incoming requests, perform routing, authentication, rate limiting, and sometimes transformation. At moderate to high traffic, a single gateway instance cannot keep up, causing increased latency and dropped requests.

Scaling Solutions

Horizontal Scaling: Add multiple API gateway instances behind a load balancer to distribute traffic.
Caching: Use response caching at the gateway or integrate with CDN to reduce backend load.
Rate Limiting & Throttling: Protect backend services by limiting requests per user or IP.
Edge Deployment: Deploy gateways closer to users globally to reduce latency.
Service Mesh Integration: For internal microservices, use service mesh to offload routing and security.
API Versioning & Routing Optimization: Efficient routing rules reduce processing time.

Back-of-Envelope Cost Analysis

At 1 million users, assuming 1 request per second per user, API gateway handles ~1 million RPS.
One server handles ~3000-5000 concurrent connections; need ~200-300 gateway instances.
Network bandwidth: 1 Gbps ~125 MB/s; estimate average request size to calculate total bandwidth.
Storage is minimal at gateway level, mostly logs and cache; scale storage for logs accordingly.
Cost grows with number of instances, bandwidth, and caching infrastructure.

Interview Tip

Start by explaining the API gateway role and its responsibilities. Discuss traffic growth impact on CPU, memory, and network. Identify the first bottleneck clearly. Then propose scaling solutions step-by-step: horizontal scaling, caching, edge deployment. Mention trade-offs and cost implications. Use real numbers to show understanding.

Self Check

Your API gateway handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?

Answer: Add more API gateway instances behind a load balancer to horizontally scale and distribute the increased load, preventing CPU/memory saturation and reducing latency.

Key Result

API gateway first breaks at CPU/memory under high request load; horizontal scaling and caching are key to handle millions of users efficiently.