0
0
Microservicessystem_design~15 mins

Parallel running in Microservices - Deep Dive

Choose your learning style9 modes available
Overview - Parallel running
What is it?
Parallel running is a method where two versions of a system run side by side at the same time. This allows teams to compare the old system with the new one by processing the same inputs in both. It helps ensure the new system works correctly before fully switching over. This approach reduces risks during upgrades or migrations.
Why it matters
Without parallel running, switching to a new system can cause unexpected failures or data loss, impacting users and business operations. Parallel running provides a safety net by letting teams detect issues early while still using the trusted old system. This reduces downtime and builds confidence in the new system's reliability.
Where it fits
Before learning parallel running, you should understand basic system deployment and testing strategies. After mastering it, you can explore advanced deployment techniques like blue-green deployment and canary releases. Parallel running fits into the broader topic of system migration and release management.
Mental Model
Core Idea
Parallel running means running old and new systems side by side to compare results and ensure smooth transition.
Think of it like...
It's like driving two cars on parallel lanes to see if the new car performs as well as the old one before selling the old car.
┌───────────────┐       ┌───────────────┐
│   Old System  │       │   New System  │
└──────┬────────┘       └──────┬────────┘
       │                       │
       │ Same Input Data       │
       ├───────────────────────┤
       │                       │
       ▼                       ▼
┌───────────────┐       ┌───────────────┐
│ Old Output    │       │ New Output    │
└───────────────┘       └───────────────┘
       │                       │
       └─────────Compare───────┘
Build-Up - 6 Steps
1
FoundationUnderstanding system migration basics
🤔
Concept: Introduce the idea of moving from an old system to a new one and the challenges involved.
When a company updates its software, it needs to move data and users from the old system to the new one. This process is called migration. Challenges include data loss, downtime, and unexpected bugs. Simple migration without checks can cause failures.
Result
Learners understand why migrating systems is tricky and why careful planning is needed.
Knowing the risks of migration sets the stage for why safer methods like parallel running are necessary.
2
FoundationBasics of running two systems simultaneously
🤔
Concept: Explain what it means to run two systems at the same time and why it helps.
Running two systems simultaneously means both receive the same inputs and produce outputs independently. This allows teams to compare results and find differences. It helps catch errors in the new system before fully switching.
Result
Learners grasp the core idea of parallel running as a safety check.
Understanding simultaneous operation is key to seeing how parallel running reduces risk.
3
IntermediateImplementing parallel running in microservices
🤔Before reading on: Do you think parallel running requires duplicating all services or only critical ones? Commit to your answer.
Concept: Show how to apply parallel running in a microservices architecture by duplicating services or routes.
In microservices, parallel running can mean deploying both old and new versions of services. Incoming requests are sent to both versions, and their responses are compared. Not all services need duplication; focus on critical or changed ones to save resources.
Result
Learners see practical ways to run parallel systems in microservices.
Knowing selective duplication balances safety and resource use in real systems.
4
IntermediateHandling data consistency during parallel running
🤔Before reading on: Should the new system write data immediately or wait until fully verified? Commit to your answer.
Concept: Discuss strategies to keep data consistent between old and new systems during parallel running.
Data consistency is crucial. One approach is to write data to both systems simultaneously but only use the old system's data until the new one is verified. Another is to write only to the old system and replay data to the new one for testing. Each has tradeoffs in complexity and risk.
Result
Learners understand how to manage data safely during parallel running.
Understanding data strategies prevents corruption and ensures smooth transition.
5
AdvancedMonitoring and comparing outputs effectively
🤔Before reading on: Do you think manual comparison is enough or automated tools are needed? Commit to your answer.
Concept: Explain how to monitor and compare outputs from both systems to detect differences automatically.
Manual comparison is slow and error-prone. Automated tools can log outputs, compare them, and alert teams on mismatches. Metrics and dashboards help track system health. This automation is essential for large-scale systems.
Result
Learners see how automation improves reliability and speed in parallel running.
Knowing the importance of automation helps scale parallel running safely.
6
ExpertChallenges and surprises in production parallel running
🤔Before reading on: Do you think parallel running eliminates all risks? Commit to your answer.
Concept: Reveal hidden challenges like timing differences, side effects, and resource overhead in real-world parallel running.
Even with parallel running, subtle issues arise. Timing differences can cause outputs to differ even if logic is correct. Side effects like sending emails or payments must be controlled to avoid duplication. Running two systems doubles resource use, impacting cost and performance.
Result
Learners appreciate the complexity and tradeoffs in production environments.
Understanding these challenges prepares teams to design safer, more efficient parallel running setups.
Under the Hood
Parallel running works by duplicating input streams to two systems and capturing their outputs independently. Internally, this requires routing layers or proxies that send identical requests to both systems. Outputs are logged and compared by monitoring tools. Data synchronization mechanisms ensure both systems have consistent state or handle eventual consistency. Side effects are isolated or controlled to prevent duplication.
Why designed this way?
Parallel running was designed to reduce risk during system upgrades by providing a live comparison between old and new systems. Alternatives like big-bang cutovers risk total failure, while parallel running allows gradual validation. The design balances safety with operational complexity and resource cost.
┌───────────────┐
│  User Input   │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│  Router/Proxy │
└──────┬────────┘
       │
 ┌─────┴─────┐
 │           │
 ▼           ▼
┌───────┐ ┌───────┐
│ Old   │ │ New   │
│System │ │System │
└──┬────┘ └──┬────┘
   │         │
   ▼         ▼
┌───────┐ ┌───────┐
│Output │ │Output │
│Logs   │ │Logs   │
└──┬────┘ └──┬────┘
   │         │
   └───Compare─────┐
                   ▼
             ┌───────────┐
             │ Monitoring│
             │ & Alerts  │
             └───────────┘
Myth Busters - 4 Common Misconceptions
Quick: Does parallel running guarantee zero downtime? Commit to yes or no.
Common Belief:Parallel running always eliminates downtime during system upgrades.
Tap to reveal reality
Reality:While parallel running reduces risk, it does not guarantee zero downtime because issues like data sync delays or resource limits can still cause interruptions.
Why it matters:Believing in zero downtime can lead to under-preparedness and unexpected outages during migration.
Quick: Is it safe to let both systems send emails or payments during parallel running? Commit to yes or no.
Common Belief:Both old and new systems can perform all side effects simultaneously without issues.
Tap to reveal reality
Reality:Allowing both systems to perform side effects like sending emails or payments can cause duplicates and confusion. Side effects must be controlled or disabled in the new system during testing.
Why it matters:Ignoring this leads to duplicated actions, harming user trust and business operations.
Quick: Does parallel running mean you must duplicate every microservice? Commit to yes or no.
Common Belief:You must run every service in parallel to do parallel running.
Tap to reveal reality
Reality:Only critical or changed services need parallel running to save resources and complexity. Others can remain on the old system until fully migrated.
Why it matters:Trying to duplicate everything wastes resources and complicates deployment unnecessarily.
Quick: Does output difference always mean the new system is wrong? Commit to yes or no.
Common Belief:Any difference in outputs between old and new systems means the new system has bugs.
Tap to reveal reality
Reality:Differences can arise from timing, non-deterministic processes, or expected improvements. Not all differences indicate errors.
Why it matters:Misinterpreting differences can cause wasted debugging effort and delay deployment.
Expert Zone
1
Timing differences between systems can cause output mismatches even if logic is correct, requiring tolerant comparison methods.
2
Side effects must be carefully isolated or mocked in the new system to avoid duplication during parallel running.
3
Resource overhead from running two systems can impact performance and cost, so selective parallel running is often used.
When NOT to use
Parallel running is not ideal when resource constraints are tight or when side effects cannot be safely isolated. Alternatives like blue-green deployment or canary releases may be better for gradual rollout without full duplication.
Production Patterns
In production, teams often run parallel running only for critical services or features. They use automated monitoring to compare outputs and gradually increase traffic to the new system. Side effects are disabled or routed carefully. Parallel running is combined with feature flags and rollback plans.
Connections
Blue-Green Deployment
Alternative deployment strategy
Understanding parallel running clarifies how blue-green deployment differs by switching traffic fully rather than running systems simultaneously.
Canary Releases
Builds on gradual rollout concepts
Parallel running helps validate new versions before canary releases gradually expose users to changes.
Scientific Experimentation
Shares the pattern of control and test groups
Parallel running mirrors running control and test groups in experiments to compare outcomes before full adoption.
Common Pitfalls
#1Allowing both systems to perform side effects like sending emails or processing payments.
Wrong approach:Both systems send confirmation emails to users simultaneously during parallel running.
Correct approach:Disable email sending in the new system or route side effects only through the old system during parallel running.
Root cause:Misunderstanding that side effects can cause duplication and user confusion if not controlled.
#2Duplicating all microservices regardless of importance or change scope.
Wrong approach:Deploy every microservice twice for parallel running, even unchanged ones.
Correct approach:Only duplicate critical or updated microservices to optimize resources and reduce complexity.
Root cause:Assuming parallel running requires full system duplication without considering cost and complexity.
#3Manually comparing outputs from old and new systems without automation.
Wrong approach:Team members read logs line by line to find differences after parallel running.
Correct approach:Use automated tools to log, compare, and alert on output differences efficiently.
Root cause:Underestimating the scale and speed needed for reliable output comparison.
Key Takeaways
Parallel running runs old and new systems side by side to safely test new versions before full migration.
It reduces risk by allowing comparison of outputs and catching errors early without disrupting users.
Managing data consistency and side effects carefully is critical to avoid corruption and duplication.
Automation in monitoring and comparison is essential for scaling parallel running in production.
Parallel running has tradeoffs in resource use and complexity, so selective application is common.