Microservicessystem_design~10 mins

Graceful degradation in Microservices - Scalability & System Analysis

Choose your learning style10 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Scalability Analysis - Graceful degradation

Growth Table: Graceful Degradation in Microservices

Users	Traffic Characteristics	System Behavior	Degradation Strategy
100 users	Low requests, low concurrency	All services fully operational	No degradation needed
10,000 users	Moderate requests, some spikes	Minor latency in non-critical services	Disable non-essential features temporarily
1,000,000 users	High requests, frequent spikes	Some services slow or partially unavailable	Fallback to cached data, limit feature set, circuit breakers active
100,000,000 users	Very high sustained traffic	Critical services prioritized, degraded UI, partial data	Full graceful degradation: disable heavy features, serve static content, queue requests

First Bottleneck

In microservices, the first bottleneck during high load is usually the downstream dependent services or databases. When a service depends on another slow or overloaded service, it causes cascading delays. This leads to increased latency and potential timeouts.

Network congestion and CPU saturation on critical services also appear early as bottlenecks.

Scaling Solutions for Graceful Degradation

Circuit Breakers: Automatically stop calls to failing services to prevent cascading failures.
Fallbacks: Serve cached or default data when a service is slow or down.
Feature Flags: Disable non-critical features dynamically to reduce load.
Load Shedding: Reject or delay low priority requests during overload.
Horizontal Scaling: Add more instances of critical services to handle load.
Asynchronous Processing: Queue requests for heavy operations to smooth spikes.
CDN and Caching: Offload static content and cache responses to reduce backend load.

Back-of-Envelope Cost Analysis

Assuming 1 million users with 1 request per second each:

Requests per second: 1,000,000 QPS total.
Single service capacity: One instance handles ~5,000 QPS.
Instances needed: ~200 instances for critical services.
Database load: 10,000 QPS max per instance; use read replicas and caching.
Network bandwidth: 1 Gbps = 125 MB/s; estimate average request size to calculate total bandwidth.
Storage: Cache storage for fallback data; size depends on data freshness and volume.

Interview Tip

When discussing graceful degradation, start by identifying critical vs non-critical services. Explain how to detect overload and failures early. Describe fallback mechanisms and circuit breakers clearly. Show understanding of user experience impact and how to prioritize features. Discuss trade-offs between availability and consistency. Use real examples like disabling image loading or showing cached data.

Self-Check Question

Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?

Answer: Add read replicas and implement caching to reduce direct database load before scaling vertically or horizontally.

Key Result

Graceful degradation helps microservices maintain core functionality under heavy load by disabling or limiting non-critical features, using fallbacks, and preventing cascading failures with circuit breakers.

Practice

(1/5)

1. What is the main goal of graceful degradation in microservices?

easy

A. To increase the number of microservices for better scaling

B. To immediately stop all services when one fails

C. To keep the system running with reduced functionality during failures

D. To replace microservices with a monolithic architecture

Graceful degradation in Microservices - Scalability & System Analysis

Start learning this pattern below

Practice

Solution

Step 1: Understand the concept of graceful degradation

Step 2: Identify the goal in microservices context

Final Answer:

Quick Check:

Solution

Step 1: Identify how graceful degradation handles failures

Step 2: Match the option that uses fallback

Final Answer:

Quick Check:

Solution

Step 1: Analyze the code flow when callService() fails

Step 2: Determine the returned value

Final Answer:

Quick Check:

Solution

Step 1: Understand exception handling and return statement

Step 2: Identify the error caused by calling toString() on null

Final Answer:

Quick Check:

Solution

Step 1: Understand graceful degradation for critical service failure

Step 2: Evaluate options for best graceful degradation

Final Answer:

Quick Check: