| Users | Traffic Characteristics | System Behavior | Degradation Strategy |
|---|---|---|---|
| 100 users | Low requests, low concurrency | All services fully operational | No degradation needed |
| 10,000 users | Moderate requests, some spikes | Minor latency in non-critical services | Disable non-essential features temporarily |
| 1,000,000 users | High requests, frequent spikes | Some services slow or partially unavailable | Fallback to cached data, limit feature set, circuit breakers active |
| 100,000,000 users | Very high sustained traffic | Critical services prioritized, degraded UI, partial data | Full graceful degradation: disable heavy features, serve static content, queue requests |
Graceful degradation in Microservices - Scalability & System Analysis
Start learning this pattern below
Jump into concepts and practice - no test required
In microservices, the first bottleneck during high load is usually the downstream dependent services or databases. When a service depends on another slow or overloaded service, it causes cascading delays. This leads to increased latency and potential timeouts.
Network congestion and CPU saturation on critical services also appear early as bottlenecks.
- Circuit Breakers: Automatically stop calls to failing services to prevent cascading failures.
- Fallbacks: Serve cached or default data when a service is slow or down.
- Feature Flags: Disable non-critical features dynamically to reduce load.
- Load Shedding: Reject or delay low priority requests during overload.
- Horizontal Scaling: Add more instances of critical services to handle load.
- Asynchronous Processing: Queue requests for heavy operations to smooth spikes.
- CDN and Caching: Offload static content and cache responses to reduce backend load.
Assuming 1 million users with 1 request per second each:
- Requests per second: 1,000,000 QPS total.
- Single service capacity: One instance handles ~5,000 QPS.
- Instances needed: ~200 instances for critical services.
- Database load: 10,000 QPS max per instance; use read replicas and caching.
- Network bandwidth: 1 Gbps = 125 MB/s; estimate average request size to calculate total bandwidth.
- Storage: Cache storage for fallback data; size depends on data freshness and volume.
When discussing graceful degradation, start by identifying critical vs non-critical services. Explain how to detect overload and failures early. Describe fallback mechanisms and circuit breakers clearly. Show understanding of user experience impact and how to prioritize features. Discuss trade-offs between availability and consistency. Use real examples like disabling image loading or showing cached data.
Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?
Answer: Add read replicas and implement caching to reduce direct database load before scaling vertically or horizontally.
Practice
graceful degradation in microservices?Solution
Step 1: Understand the concept of graceful degradation
Graceful degradation means the system continues to work even if some parts fail, but with limited features.Step 2: Identify the goal in microservices context
In microservices, it ensures users still get responses, possibly simpler or fallback, instead of total failure.Final Answer:
To keep the system running with reduced functionality during failures -> Option CQuick Check:
Graceful degradation = reduced functionality during failure [OK]
- Thinking graceful degradation means full system shutdown
- Confusing graceful degradation with scaling techniques
- Assuming it replaces microservices with monolith
Solution
Step 1: Identify how graceful degradation handles failures
It uses fallback responses or simpler data to keep the system responsive.Step 2: Match the option that uses fallback
Use a fallback response when the called service is unavailable describes using fallback response when a service is down, which is correct.Final Answer:
Use a fallback response when the called service is unavailable -> Option DQuick Check:
Fallback response = graceful degradation [OK]
- Stopping entire request instead of fallback
- Ignoring failure without response
- Restarting cluster is not graceful degradation
response = callService()
if response == null:
response = getCachedData()
return responseWhat will be returned if
callService() fails?Solution
Step 1: Analyze the code flow when callService() fails
If callService() returns null (failure), the code fetches cached data as fallback.Step 2: Determine the returned value
The fallback cached data is returned instead of null or error.Final Answer:
Cached data as fallback -> Option AQuick Check:
Fallback cached data returned on failure [OK]
- Assuming error message is returned
- Thinking null is returned directly
- Confusing empty string with fallback data
try {
data = fetchFromService()
} catch (Exception e) {
data = null
}
return data.toString()What is the main problem with this code?
Solution
Step 1: Understand exception handling and return statement
If fetchFromService() fails, data is set to null, then data.toString() is called.Step 2: Identify the error caused by calling toString() on null
Calling toString() on null causes a runtime NullPointerException or similar error.Final Answer:
It returns null.toString() causing a runtime error -> Option BQuick Check:
Calling toString() on null causes error [OK]
- Ignoring null check before toString()
- Assuming exception is handled fully
- Thinking it retries infinitely
Solution
Step 1: Understand graceful degradation for critical service failure
When payment service fails, system should still respond with limited info, not block or error out.Step 2: Evaluate options for best graceful degradation
Return a simplified confirmation without payment details and log failure for retry returns simplified confirmation and logs failure for retry, maintaining user experience and system reliability.Final Answer:
Return a simplified confirmation without payment details and log failure for retry -> Option AQuick Check:
Simplified response + retry = graceful degradation [OK]
- Blocking entire process on failure
- Sending immediate error without fallback
- Removing critical service entirely
