Bird
Raised Fist0
Microservicessystem_design~7 mins

Blue-green deployment in Microservices - System Design Guide

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Problem Statement
Deploying new versions of a service directly on the live environment can cause downtime or unexpected failures. If the new version has bugs, users experience errors or service interruptions, and rolling back is slow and risky.
Solution
Blue-green deployment solves this by running two identical environments: one active (blue) serving all traffic, and one idle (green) with the new version. After testing the green environment, traffic switches instantly from blue to green, minimizing downtime and enabling quick rollback by switching back if needed.
Architecture
Load
Balancer
Blue Env
Green Env
─────────────┘

This diagram shows a load balancer directing traffic to the blue environment, which is live. The green environment runs the new version and is idle until traffic switches over.

Trade-offs
✓ Pros
Zero downtime deployment by switching traffic instantly.
Easy rollback by switching back to the previous environment.
Safe testing of new version in production-like environment before release.
Reduces risk of deployment failures affecting users.
✗ Cons
Requires double infrastructure, increasing cost.
Data synchronization between environments can be complex.
Switching traffic may cause session or cache inconsistencies if not handled.
When you need zero downtime deployments and quick rollback for critical services with high availability requirements, typically at scale above hundreds of requests per second.
When infrastructure cost is a major constraint or the system state cannot be easily duplicated between environments, such as tightly coupled databases without replication.
Real World Examples
Amazon
Amazon uses blue-green deployment to release new versions of their e-commerce services without downtime, ensuring customers always have a smooth shopping experience.
Netflix
Netflix applies blue-green deployment to update streaming services, allowing instant rollback if new code causes playback issues.
Uber
Uber uses blue-green deployment to update ride matching services, minimizing disruption during peak hours.
Alternatives
Canary deployment
Gradually shifts a small percentage of traffic to the new version instead of switching all at once.
Use when: When you want to monitor new version behavior on a small user subset before full rollout.
Rolling deployment
Updates instances one by one in place without maintaining two full environments.
Use when: When infrastructure cost is a concern and small downtime or degraded performance is acceptable.
Summary
Blue-green deployment runs two identical environments to enable zero downtime releases.
It allows instant traffic switching and quick rollback to reduce deployment risks.
It requires extra infrastructure and careful data synchronization between environments.

Practice

(1/5)
1. What is the main purpose of blue-green deployment in microservices?
easy
A. To improve database query speed
B. To increase the number of microservices in the system
C. To reduce downtime by switching traffic between two identical environments
D. To simplify the codebase by merging services

Solution

  1. Step 1: Understand blue-green deployment concept

    Blue-green deployment uses two identical environments to avoid downtime during updates.
  2. Step 2: Identify the main goal

    The main goal is to switch traffic between environments to keep the system available without interruption.
  3. Final Answer:

    To reduce downtime by switching traffic between two identical environments -> Option C
  4. Quick Check:

    Blue-green deployment = reduce downtime [OK]
Hint: Blue-green means two environments for zero downtime [OK]
Common Mistakes:
  • Confusing deployment with scaling
  • Thinking it improves database speed
  • Assuming it merges microservices
2. Which of the following is the correct sequence in a blue-green deployment?
easy
A. Deploy to green, test, switch traffic from blue to green
B. Deploy to blue, switch traffic, then test on green
C. Switch traffic first, then deploy to blue
D. Deploy to green, switch traffic, then test on blue

Solution

  1. Step 1: Recall deployment steps

    In blue-green deployment, new code is deployed to the inactive environment (green).
  2. Step 2: Test and switch traffic

    After testing green, traffic is switched from blue (active) to green (new).
  3. Final Answer:

    Deploy to green, test, switch traffic from blue to green -> Option A
  4. Quick Check:

    Deploy-test-switch = A [OK]
Hint: Deploy to inactive env, test, then switch traffic [OK]
Common Mistakes:
  • Switching traffic before testing
  • Testing on active environment
  • Deploying after switching traffic
3. Consider this simplified code snippet for switching traffic in blue-green deployment:
current_env = "blue"
new_env = "green"
if current_env == "blue":
    current_env = new_env
else:
    current_env = "blue"
print(current_env)
What will be the output?
medium
A. "blue"
B. None
C. SyntaxError
D. "green"

Solution

  1. Step 1: Analyze initial variables

    current_env starts as "blue", new_env is "green".
  2. Step 2: Evaluate the if condition

    Since current_env == "blue", it sets current_env = new_env, which is "green".
  3. Final Answer:

    "green" -> Option D
  4. Quick Check:

    Switching from blue to green prints green [OK]
Hint: If current is blue, switch to green [OK]
Common Mistakes:
  • Confusing assignment direction
  • Expecting original value to print
  • Thinking code has syntax error
4. A team uses blue-green deployment but users report downtime during the switch. What is the most likely cause?
medium
A. The old environment was not shut down
B. Traffic was switched before the new environment was fully ready
C. The database was not updated
D. The new environment was tested too long

Solution

  1. Step 1: Understand downtime cause in blue-green

    Downtime usually happens if traffic switches before the new environment is ready to serve requests.
  2. Step 2: Evaluate options

    Old environment running or database update issues don't cause immediate downtime during switch; testing too long delays deployment but not downtime.
  3. Final Answer:

    Traffic was switched before the new environment was fully ready -> Option B
  4. Quick Check:

    Premature traffic switch = downtime [OK]
Hint: Switch traffic only after new env is ready [OK]
Common Mistakes:
  • Assuming old env causes downtime
  • Ignoring readiness checks
  • Blaming database updates for switch downtime
5. You manage a critical microservices system using blue-green deployment. After switching traffic to green, you discover a severe bug. What is the best immediate action to minimize downtime?
hard
A. Switch traffic back to blue environment immediately
B. Fix the bug in green environment and keep traffic there
C. Restart both environments simultaneously
D. Deploy a new environment and switch traffic there

Solution

  1. Step 1: Understand rollback in blue-green deployment

    Blue-green allows quick rollback by switching traffic back to the previous stable environment (blue).
  2. Step 2: Evaluate options for minimizing downtime

    Fixing bug in green delays recovery; restarting both causes downtime; deploying new environment takes time.
  3. Final Answer:

    Switch traffic back to blue environment immediately -> Option A
  4. Quick Check:

    Rollback by switching traffic = minimize downtime [OK]
Hint: Rollback by switching traffic to old env fast [OK]
Common Mistakes:
  • Trying to fix bug before rollback
  • Restarting both environments causing downtime
  • Deploying new env wastes time