Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Design: Parallel Running Deployment System
Design focuses on the deployment and routing architecture for parallel running of microservices. It excludes detailed implementation of microservice business logic and database schema beyond what supports parallel running.
Functional Requirements
FR1: Support running two versions of a microservice simultaneously in production.
FR2: Route a configurable percentage of user requests to the new version while the rest go to the old version.
FR3: Ensure data consistency between versions during parallel operation.
FR4: Allow quick rollback to the old version if issues occur in the new version.
FR5: Monitor performance and errors separately for each version.
FR6: Minimize downtime during deployment and switching.
Non-Functional Requirements
NFR1: Handle up to 10,000 concurrent users with low latency (p99 < 200ms).
NFR2: Availability target of 99.9% uptime (less than 8.77 hours downtime per year).
NFR3: Data consistency must be eventual but with mechanisms to detect divergence.
NFR4: Deployment changes should not cause user-visible errors or downtime.
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
❓ Question 6
Key Components
API Gateway or Load Balancer with traffic splitting
Service Registry and Discovery
Centralized Logging and Monitoring
Database with version-aware schema or synchronization
Deployment Automation (CI/CD pipeline)
Feature Flags or Configuration Management
Design Patterns
Blue-Green Deployment
Canary Releases
Feature Toggles
Eventual Consistency
Circuit Breaker for fault tolerance
Reference Architecture
+-------------------+
| API Gateway |
| (Traffic Splitter)|
+---------+---------+
|
+---------------+---------------+
| |
+---------v---------+ +---------v---------+
| Old Version | | New Version |
| Microservice | | Microservice |
+---------+---------+ +---------+---------+
| |
+---------------+---------------+
|
+---------v---------+
| Shared Database |
| (Version-aware or |
| synchronized) |
+-------------------+
Additional components:
- Monitoring & Logging system collects metrics from both versions.
- Deployment Automation manages rollout and rollback.
- Configuration Service controls traffic split ratios.
Components
API Gateway
Kong, NGINX, or AWS API Gateway
Routes incoming requests to old or new microservice versions based on configured traffic split.
Old Version Microservice
Docker container or Kubernetes pod
Handles requests with stable, currently deployed version.
New Version Microservice
Docker container or Kubernetes pod
Handles requests for new version during parallel running.
Shared Database
PostgreSQL or MongoDB with version-aware schema or synchronization
Stores data accessible by both versions, ensuring consistency.
Monitoring & Logging
Prometheus, Grafana, ELK Stack
Collects and visualizes metrics and logs separately for each version.
Deployment Automation
Jenkins, GitHub Actions, ArgoCD
Automates deployment, traffic shifting, and rollback.
Configuration Service
Consul, etcd, or custom config server
Manages traffic split ratios and feature flags.
Request Flow
1. 1. User sends request to API Gateway.
2. 2. API Gateway checks traffic split configuration.
3. 3. Request is routed to either old or new microservice version accordingly.
4. 4. Microservice processes request and reads/writes data to shared database.
5. 5. Both versions log metrics and errors to Monitoring & Logging system.
6. 6. Deployment Automation adjusts traffic split gradually to new version.
7. 7. If issues detected, rollback triggers traffic to old version only.
8. 8. After stable operation, old version is decommissioned.
Database Schema
Entities are designed to support both versions, either by:
- Using backward-compatible schema changes,
- Or maintaining version tags on records,
- Or synchronizing data asynchronously if schema differs.
Relationships remain consistent to support both versions' data needs.
Scaling Discussion
Bottlenecks
API Gateway becomes a bottleneck if not horizontally scalable.
Database contention due to simultaneous writes from two versions.
Monitoring system overload with doubled metrics volume.
Deployment automation complexity increases with multiple versions.
Data inconsistency risk if schema changes are incompatible.
Solutions
Use horizontally scalable API Gateway clusters with load balancing.
Implement database sharding or use multi-master replication for write scaling.
Aggregate and sample metrics to reduce monitoring load.
Automate deployment with robust rollback and health checks.
Use feature flags and backward-compatible schema migrations to minimize inconsistency.
Interview Tips
Time: Spend 10 minutes clarifying requirements and constraints, 20 minutes designing architecture and data flow, 10 minutes discussing scaling and trade-offs, 5 minutes summarizing.
Explain how traffic splitting enables safe parallel running.
Discuss data consistency challenges and solutions.
Highlight monitoring and rollback importance.
Describe how deployment automation reduces human error.
Mention scalability considerations and how to address bottlenecks.
Practice
(1/5)
1. What is the main purpose of parallel running in microservices?
easy
A. To run old and new systems together to ensure smooth transition
B. To replace the old system immediately without testing
C. To run only the new system and discard the old one
D. To run multiple unrelated services in parallel
Solution
Step 1: Understand the concept of parallel running
Parallel running means running old and new systems side by side to compare their outputs and ensure the new system works correctly.
Step 2: Identify the purpose in microservices
This approach helps catch errors and ensures a smooth transition before fully switching to the new system.
Final Answer:
To run old and new systems together to ensure smooth transition -> Option A
Quick Check:
Parallel running = run old and new systems together [OK]
Hint: Parallel running means running old and new systems side by side [OK]
Common Mistakes:
Thinking parallel running means immediate replacement
Confusing parallel running with running unrelated services
Assuming old system is discarded immediately
2. Which of the following is the correct way to implement parallel running in a microservices upgrade?
easy
A. Deploy new microservice version alongside old one and route a copy of requests to both
B. Stop old microservice and deploy new one immediately
C. Deploy new microservice and ignore old service logs
D. Run new microservice only during off-peak hours
Solution
Step 1: Understand deployment in parallel running
Parallel running requires both old and new versions to run simultaneously to compare results.
Step 2: Identify correct routing method
Routing a copy of requests to both versions allows output comparison without disrupting users.
Final Answer:
Deploy new microservice version alongside old one and route a copy of requests to both -> Option A
Quick Check:
Parallel running = deploy both and route requests to both [OK]
Hint: Route requests to both old and new services in parallel [OK]
Common Mistakes:
Stopping old service before testing new one
Ignoring logs from old service
Running new service only at specific times
3. Consider a microservice system where requests are sent to both old and new versions during parallel running. If the old service returns response A and the new service returns response B, what should the system do?
medium
A. Ignore the difference and continue using the new service
B. Switch back to the old service permanently
C. Stop the old service immediately
D. Log the difference and alert engineers for investigation
Solution
Step 1: Understand output comparison in parallel running
Parallel running compares outputs to detect discrepancies between old and new services.
Step 2: Decide action on output mismatch
If outputs differ, the system should log the difference and alert engineers to investigate before switching fully.
Final Answer:
Log the difference and alert engineers for investigation -> Option D
Quick Check:
Output mismatch = log and alert [OK]
Hint: Log and alert on output differences during parallel running [OK]
Common Mistakes:
Ignoring output differences
Stopping old service too early
Switching back permanently without investigation
4. A team implemented parallel running but noticed that the new service never receives any requests. What is the most likely cause?
medium
A. The new service crashed immediately after deployment
B. The routing logic is only sending requests to the old service
C. The old service is not logging requests
D. The new service is slower than the old one
Solution
Step 1: Analyze routing in parallel running
For parallel running, requests must be routed to both old and new services simultaneously.
Step 2: Identify why new service gets no requests
If new service never receives requests, routing likely sends all traffic only to old service.
Final Answer:
The routing logic is only sending requests to the old service -> Option B
Quick Check:
No requests to new service = routing issue [OK]
Hint: Check routing logic if new service gets no requests [OK]
Common Mistakes:
Assuming new service crashed without checking logs
Blaming old service logs
Thinking speed affects request routing
5. You are designing a parallel running strategy for a microservices system with high traffic. Which approach best balances safety and performance?
hard
A. Route 100% of traffic to new service and keep old service idle
B. Run new service only during low traffic hours without output comparison
C. Route 10% of traffic to new service and 90% to old service, compare outputs, then gradually increase new service traffic
D. Stop old service immediately and monitor new service logs
Solution
Step 1: Understand gradual traffic shifting in parallel running
Gradually increasing traffic to the new service while comparing outputs reduces risk and performance impact.
Step 2: Evaluate options for safety and performance
Routing a small portion initially and increasing after validation balances safety and system load.
Final Answer:
Route 10% of traffic to new service and 90% to old service, compare outputs, then gradually increase new service traffic -> Option C
Quick Check:
Gradual traffic shift with output comparison = safe and performant [OK]
Hint: Start small traffic to new service, compare, then increase [OK]