0
0
Microservicessystem_design~10 mins

Graceful degradation in Microservices - Scalability & System Analysis

Choose your learning style9 modes available
Scalability Analysis - Graceful degradation
Growth Table: Graceful Degradation in Microservices
UsersTraffic CharacteristicsSystem BehaviorDegradation Strategy
100 usersLow requests, low concurrencyAll services fully operationalNo degradation needed
10,000 usersModerate requests, some spikesMinor latency in non-critical servicesDisable non-essential features temporarily
1,000,000 usersHigh requests, frequent spikesSome services slow or partially unavailableFallback to cached data, limit feature set, circuit breakers active
100,000,000 usersVery high sustained trafficCritical services prioritized, degraded UI, partial dataFull graceful degradation: disable heavy features, serve static content, queue requests
First Bottleneck

In microservices, the first bottleneck during high load is usually the downstream dependent services or databases. When a service depends on another slow or overloaded service, it causes cascading delays. This leads to increased latency and potential timeouts.

Network congestion and CPU saturation on critical services also appear early as bottlenecks.

Scaling Solutions for Graceful Degradation
  • Circuit Breakers: Automatically stop calls to failing services to prevent cascading failures.
  • Fallbacks: Serve cached or default data when a service is slow or down.
  • Feature Flags: Disable non-critical features dynamically to reduce load.
  • Load Shedding: Reject or delay low priority requests during overload.
  • Horizontal Scaling: Add more instances of critical services to handle load.
  • Asynchronous Processing: Queue requests for heavy operations to smooth spikes.
  • CDN and Caching: Offload static content and cache responses to reduce backend load.
Back-of-Envelope Cost Analysis

Assuming 1 million users with 1 request per second each:

  • Requests per second: 1,000,000 QPS total.
  • Single service capacity: One instance handles ~5,000 QPS.
  • Instances needed: ~200 instances for critical services.
  • Database load: 10,000 QPS max per instance; use read replicas and caching.
  • Network bandwidth: 1 Gbps = 125 MB/s; estimate average request size to calculate total bandwidth.
  • Storage: Cache storage for fallback data; size depends on data freshness and volume.
Interview Tip

When discussing graceful degradation, start by identifying critical vs non-critical services. Explain how to detect overload and failures early. Describe fallback mechanisms and circuit breakers clearly. Show understanding of user experience impact and how to prioritize features. Discuss trade-offs between availability and consistency. Use real examples like disabling image loading or showing cached data.

Self-Check Question

Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?

Answer: Add read replicas and implement caching to reduce direct database load before scaling vertically or horizontally.

Key Result
Graceful degradation helps microservices maintain core functionality under heavy load by disabling or limiting non-critical features, using fallbacks, and preventing cascading failures with circuit breakers.