Microservicessystem_design~7 mins

Rollback strategies in Microservices - System Design Guide

Choose your learning style10 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Problem Statement

When a new version of a microservice is deployed and contains bugs or causes failures, the system can become unstable or unusable. Without a clear rollback plan, fixing these issues can take a long time, causing downtime and loss of user trust.

Solution

Rollback strategies provide a controlled way to revert a microservice to a previous stable version quickly. This is done by keeping previous versions ready and switching traffic back to them when problems arise, minimizing downtime and impact on users.

Architecture

User Request

→API Gateway

↓

Deployment Tool

This diagram shows how user requests flow through an API Gateway to the active microservice version. The Deployment Tool manages microservice versions and can switch traffic back to a previous version to rollback.

Trade-offs

✓ Pros

→

Minimizes downtime by quickly reverting to a known stable version.

→

Reduces risk of prolonged outages caused by faulty deployments.

→

Supports continuous delivery by enabling safe experimentation.

→

Improves user experience by maintaining service availability.

✗ Cons

→

Requires maintaining multiple versions and extra storage.

→

Rollback may cause data inconsistencies if schema changes are involved.

→

Complexity increases with dependencies between microservices.

Use rollback strategies when deploying microservices in production environments with frequent releases and when uptime is critical, typically at scales of hundreds or more requests per second.

Avoid rollback strategies in very simple systems with infrequent deployments or when the cost of maintaining multiple versions outweighs the benefits, such as small internal tools with low traffic.

Real World Examples

Netflix

Netflix uses automated rollback strategies to revert microservice versions instantly when new deployments cause errors, ensuring uninterrupted streaming.

Uber

Uber employs rollback strategies to quickly switch back to previous microservice versions during incidents, minimizing impact on ride requests.

Amazon

Amazon uses rollback mechanisms in their deployment pipelines to maintain high availability of their e-commerce services during frequent updates.

Code Example

Before applying rollback strategies, the microservice runs a buggy version without a way to revert quickly. After applying rollback, the service keeps multiple versions and can switch the active version back to a stable one instantly, minimizing downtime.

Microservices

### Before rollback strategy (naive deployment)
class Microservice:
    def __init__(self):
        self.version = 'v2.0'

    def handle_request(self, request):
        if self.version == 'v2.0':
            # buggy code
            return 'error'
        else:
            return 'ok'

service = Microservice()
print(service.handle_request('request'))  # returns 'error'


### After rollback strategy applied
class Microservice:
    def __init__(self):
        self.active_version = 'v2.0'
        self.versions = {
            'v1.0': self.v1_0_handler,
            'v2.0': self.v2_0_handler
        }

    def v1_0_handler(self, request):
        return 'ok'

    def v2_0_handler(self, request):
        # buggy code
        return 'error'

    def handle_request(self, request):
        return self.versions[self.active_version](request)

    def rollback(self):
        self.active_version = 'v1.0'

service = Microservice()
print(service.handle_request('request'))  # returns 'error'
service.rollback()
print(service.handle_request('request'))  # returns 'ok'

OutputSuccess

Alternatives

Blue-Green Deployment

Deploys new version alongside the old one and switches traffic atomically, avoiding downtime without immediate rollback.

Use when: Choose when you want zero downtime deployments and can afford double infrastructure temporarily.

Canary Deployment

Gradually shifts traffic to the new version to detect issues early before full rollout, reducing rollback frequency.

Use when: Choose when you want to test new versions on a small user subset before full deployment.

Feature Flags

Controls new features at runtime without redeploying, allowing quick disable instead of full rollback.

Use when: Choose when you want fine-grained control over features and faster recovery from issues.

Summary

Rollback strategies prevent prolonged outages by quickly reverting to stable microservice versions.

They require maintaining multiple versions and managing traffic routing between them.

Rollback is essential in production systems with frequent deployments and high availability needs.

Practice

(1/5)

1. What is the main purpose of a rollback strategy in microservices?

easy

A. To quickly undo a bad deployment and restore the previous stable state

B. To add new features to the system without downtime

C. To permanently delete old versions of services

D. To monitor system performance continuously

Rollback strategies in Microservices - System Design Guide

Start learning this pattern below

Practice

Solution

Step 1: Understand rollback purpose

Step 2: Identify correct purpose in options

Final Answer:

Quick Check:

Solution

Step 1: Recall blue-green deployment basics

Step 2: Identify rollback action

Final Answer:

Quick Check:

Solution

Step 1: Analyze the condition in code

Step 2: Understand the action on condition true

Final Answer:

Quick Check:

Solution

Step 1: Identify rollback script failure impact

Step 2: Choose safe recovery action

Final Answer:

Quick Check:

Solution

Step 1: Understand problem cause

Step 2: Identify architectural fix

Final Answer:

Quick Check: