0
0
Microservicessystem_design~10 mins

REST API between services in Microservices - Scalability & System Analysis

Choose your learning style9 modes available
Scalability Analysis - REST API between services
Growth Table: REST API Between Services
Users/TrafficAPI Requests per SecondLatencyService InstancesNetwork LoadData Volume
100 users~50-100 RPSLow (10-50ms)1-2 instances per serviceLowSmall
10,000 users~5,000-10,000 RPSModerate (50-100ms)3-5 instances per serviceModerateMedium
1,000,000 users~500,000-1,000,000 RPSHigher (100-200ms)10+ instances per service, autoscalingHighLarge
100,000,000 users~50,000,000+ RPSHigh (200ms+)Hundreds of instances, global distributionVery HighVery Large
First Bottleneck

At low scale, the first bottleneck is usually the API gateway or load balancer handling incoming REST calls. As traffic grows, the service instances CPU and memory become bottlenecks due to request processing. At medium scale, the network bandwidth between services can limit throughput. At very large scale, database or stateful service dependencies accessed via REST APIs become the main bottleneck.

Scaling Solutions
  • Horizontal scaling: Add more service instances behind load balancers to distribute REST API calls.
  • API Gateway optimization: Use caching, rate limiting, and request aggregation to reduce load.
  • Asynchronous communication: Use message queues or event streams to reduce synchronous REST calls.
  • Service partitioning: Split services by domain or function to reduce inter-service calls.
  • Network improvements: Use faster network links, service mesh with optimized routing.
  • Database scaling: Use read replicas, caching layers, and sharding to reduce backend bottlenecks.
  • CDN: For REST APIs serving static or cacheable content, use CDN to offload traffic.
Back-of-Envelope Cost Analysis

Assuming 10,000 users generating 10,000 RPS:

  • Each server handles ~3,000 RPS -> need ~4 service instances per service.
  • Network bandwidth: 10,000 RPS * 1 KB/request = ~10 MB/s per service.
  • Storage: Logs and metrics grow with traffic; plan for scalable storage.
  • CPU and memory scale linearly with request volume; monitor and autoscale.
Interview Tip

Structure your scalability discussion by:

  1. Defining expected traffic and usage patterns.
  2. Identifying bottlenecks at each scale step.
  3. Proposing targeted solutions for each bottleneck.
  4. Considering trade-offs like cost, complexity, and latency.
  5. Discussing monitoring and autoscaling strategies.
Self Check

Your database handles 1000 QPS. Traffic grows 10x. What do you do first?

Answer: Add read replicas and implement caching to reduce direct database load before scaling vertically or sharding.

Key Result
REST APIs between services scale well with horizontal service instances and load balancing, but the first bottleneck is usually service CPU and network bandwidth. Database and backend dependencies become bottlenecks at large scale, requiring caching, read replicas, and sharding.