Bird
Raised Fist0
Microservicessystem_design~10 mins

Parallel running in Microservices - Scalability & System Analysis

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Scalability Analysis - Parallel running
Growth Table: Parallel Running in Microservices
Users / TrafficWhat Changes?
100 usersSingle microservice version runs; parallel running not needed.
10,000 usersStart parallel running new microservice version alongside old for testing and smooth transition.
1,000,000 usersMultiple parallel instances of old and new versions run; traffic split carefully; monitoring and rollback mechanisms critical.
100,000,000 usersParallel running at scale requires automated deployment, canary releases, feature flags; orchestration tools manage many versions and services.
First Bottleneck

The first bottleneck in parallel running is the increased resource usage on servers and network. Running multiple versions simultaneously doubles or triples CPU, memory, and bandwidth needs. This can overwhelm application servers and increase latency if not managed well.

Scaling Solutions
  • Horizontal scaling: Add more servers or containers to distribute load of parallel versions.
  • Load balancing: Use smart load balancers to route traffic between versions efficiently.
  • Feature flags and canary releases: Gradually shift traffic to new versions to reduce risk and resource spikes.
  • Resource isolation: Use container orchestration (e.g., Kubernetes) to allocate resources per version and avoid interference.
  • Monitoring and auto-scaling: Track resource usage and scale instances automatically to handle load.
Back-of-Envelope Cost Analysis

Assuming 1 server handles ~3000 concurrent connections:

  • At 10,000 users, running 2 versions in parallel needs ~7 servers (10,000 users * 2 versions / 3000 users per server).
  • At 1,000,000 users, parallel running 2 versions requires ~667 servers.
  • Network bandwidth doubles with parallel running; if each user request is 100KB, 1M users generate ~100GB/s total traffic.
  • Storage for logs and metrics also doubles; plan for increased disk and database capacity.
Interview Tip

When discussing parallel running scalability, start by explaining why parallel running is used (safe upgrades, testing). Then identify the resource overhead as the first bottleneck. Next, describe how horizontal scaling and orchestration tools help manage multiple versions. Finally, mention monitoring and gradual rollout strategies to minimize risk and cost.

Self Check

Your database handles 1000 QPS. Traffic grows 10x due to parallel running of new microservice version. What do you do first?

Answer: Add read replicas and implement caching to reduce load on the primary database before scaling application servers. This addresses the database bottleneck caused by increased queries from parallel versions.

Key Result
Parallel running increases resource usage significantly; the first bottleneck is server resource limits. Horizontal scaling, load balancing, and orchestration are key to managing growth safely.

Practice

(1/5)
1. What is the main purpose of parallel running in microservices?
easy
A. To run old and new systems together to ensure smooth transition
B. To replace the old system immediately without testing
C. To run only the new system and discard the old one
D. To run multiple unrelated services in parallel

Solution

  1. Step 1: Understand the concept of parallel running

    Parallel running means running old and new systems side by side to compare their outputs and ensure the new system works correctly.
  2. Step 2: Identify the purpose in microservices

    This approach helps catch errors and ensures a smooth transition before fully switching to the new system.
  3. Final Answer:

    To run old and new systems together to ensure smooth transition -> Option A
  4. Quick Check:

    Parallel running = run old and new systems together [OK]
Hint: Parallel running means running old and new systems side by side [OK]
Common Mistakes:
  • Thinking parallel running means immediate replacement
  • Confusing parallel running with running unrelated services
  • Assuming old system is discarded immediately
2. Which of the following is the correct way to implement parallel running in a microservices upgrade?
easy
A. Deploy new microservice version alongside old one and route a copy of requests to both
B. Stop old microservice and deploy new one immediately
C. Deploy new microservice and ignore old service logs
D. Run new microservice only during off-peak hours

Solution

  1. Step 1: Understand deployment in parallel running

    Parallel running requires both old and new versions to run simultaneously to compare results.
  2. Step 2: Identify correct routing method

    Routing a copy of requests to both versions allows output comparison without disrupting users.
  3. Final Answer:

    Deploy new microservice version alongside old one and route a copy of requests to both -> Option A
  4. Quick Check:

    Parallel running = deploy both and route requests to both [OK]
Hint: Route requests to both old and new services in parallel [OK]
Common Mistakes:
  • Stopping old service before testing new one
  • Ignoring logs from old service
  • Running new service only at specific times
3. Consider a microservice system where requests are sent to both old and new versions during parallel running. If the old service returns response A and the new service returns response B, what should the system do?
medium
A. Ignore the difference and continue using the new service
B. Switch back to the old service permanently
C. Stop the old service immediately
D. Log the difference and alert engineers for investigation

Solution

  1. Step 1: Understand output comparison in parallel running

    Parallel running compares outputs to detect discrepancies between old and new services.
  2. Step 2: Decide action on output mismatch

    If outputs differ, the system should log the difference and alert engineers to investigate before switching fully.
  3. Final Answer:

    Log the difference and alert engineers for investigation -> Option D
  4. Quick Check:

    Output mismatch = log and alert [OK]
Hint: Log and alert on output differences during parallel running [OK]
Common Mistakes:
  • Ignoring output differences
  • Stopping old service too early
  • Switching back permanently without investigation
4. A team implemented parallel running but noticed that the new service never receives any requests. What is the most likely cause?
medium
A. The new service crashed immediately after deployment
B. The routing logic is only sending requests to the old service
C. The old service is not logging requests
D. The new service is slower than the old one

Solution

  1. Step 1: Analyze routing in parallel running

    For parallel running, requests must be routed to both old and new services simultaneously.
  2. Step 2: Identify why new service gets no requests

    If new service never receives requests, routing likely sends all traffic only to old service.
  3. Final Answer:

    The routing logic is only sending requests to the old service -> Option B
  4. Quick Check:

    No requests to new service = routing issue [OK]
Hint: Check routing logic if new service gets no requests [OK]
Common Mistakes:
  • Assuming new service crashed without checking logs
  • Blaming old service logs
  • Thinking speed affects request routing
5. You are designing a parallel running strategy for a microservices system with high traffic. Which approach best balances safety and performance?
hard
A. Route 100% of traffic to new service and keep old service idle
B. Run new service only during low traffic hours without output comparison
C. Route 10% of traffic to new service and 90% to old service, compare outputs, then gradually increase new service traffic
D. Stop old service immediately and monitor new service logs

Solution

  1. Step 1: Understand gradual traffic shifting in parallel running

    Gradually increasing traffic to the new service while comparing outputs reduces risk and performance impact.
  2. Step 2: Evaluate options for safety and performance

    Routing a small portion initially and increasing after validation balances safety and system load.
  3. Final Answer:

    Route 10% of traffic to new service and 90% to old service, compare outputs, then gradually increase new service traffic -> Option C
  4. Quick Check:

    Gradual traffic shift with output comparison = safe and performant [OK]
Hint: Start small traffic to new service, compare, then increase [OK]
Common Mistakes:
  • Switching 100% traffic immediately
  • Skipping output comparison
  • Stopping old service too early