Bird
Raised Fist0
Microservicessystem_design~25 mins

Traffic management (routing, splitting) in Microservices - System Design Exercise

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Design: Microservices Traffic Management System
Design focuses on traffic routing and splitting layer in front of microservices. Does not cover microservice internal logic or database design. Does not include client-side load balancing.
Functional Requirements
FR1: Route incoming client requests to appropriate microservice instances based on service version or environment.
FR2: Support traffic splitting to gradually shift traffic between different service versions (e.g., canary releases).
FR3: Allow dynamic configuration of routing rules without downtime.
FR4: Provide observability for traffic distribution and routing decisions.
FR5: Ensure minimal added latency (p99 < 50ms) for routing decisions.
FR6: Handle at least 10,000 requests per second with 99.9% availability.
Non-Functional Requirements
NFR1: System must support zero downtime updates to routing rules.
NFR2: Latency overhead for routing must be minimal to avoid user impact.
NFR3: High availability with failover for routing components.
NFR4: Scalable to handle traffic spikes up to 50,000 requests per second.
NFR5: Security: only authorized operators can change routing rules.
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
❓ Question 6
Key Components
API Gateway or Ingress Controller
Service Registry or Discovery
Configuration Management System
Traffic Router or Proxy
Monitoring and Logging System
Authentication and Authorization for config changes
Design Patterns
Canary Deployment
Blue-Green Deployment
Feature Flags
Circuit Breaker
Sidecar Proxy Pattern
Reference Architecture
API Gateway / Ingress Controller
Traffic Router / Proxy
Microservice Instances (v1, v2, ...)
Monitoring & Logging
Auth Service
Components
API Gateway / Ingress Controller
NGINX, Envoy, or Kong
Entry point for client requests, performs initial routing and security checks.
Traffic Router / Proxy
Envoy Proxy or custom router
Makes routing and traffic splitting decisions based on configured rules.
Configuration Management
Consul, etcd, or custom config service
Stores routing rules and traffic split percentages, supports dynamic updates.
Microservice Instances
Docker containers or Kubernetes pods
Run different versions of microservices to receive routed traffic.
Monitoring & Logging
Prometheus, Grafana, ELK stack
Collects metrics and logs for traffic distribution and routing decisions.
Authentication & Authorization Service
OAuth2 server or RBAC system
Controls who can update routing configurations.
Request Flow
1. Client sends request to API Gateway.
2. API Gateway forwards request to Traffic Router.
3. Traffic Router fetches routing rules from Configuration Management.
4. Traffic Router decides target microservice version based on rules and traffic split percentages.
5. Traffic Router forwards request to selected microservice instance.
6. Microservice processes request and sends response back through Traffic Router and API Gateway.
7. Monitoring system collects metrics on routing decisions and traffic distribution.
8. Authorized operators update routing rules via Configuration Management secured by Auth Service.
Database Schema
Entities: - RoutingRule: id, service_name, version, criteria (e.g., user segment), traffic_percentage, active_flag - ServiceInstance: id, service_name, version, endpoint, health_status - UserRole: id, user_id, role_name - AuditLog: id, user_id, action, timestamp, details Relationships: - RoutingRule linked to ServiceInstance by service_name and version - UserRole linked to users for authorization - AuditLog records configuration changes by users
Scaling Discussion
Bottlenecks
Traffic Router becomes a bottleneck under very high request rates.
Configuration Management latency affects routing decision speed.
Monitoring system overload with high volume of metrics.
API Gateway limits throughput if not horizontally scalable.
Solutions
Deploy multiple Traffic Router instances behind a load balancer for horizontal scaling.
Cache routing rules locally in Traffic Router with TTL to reduce config fetch latency.
Use sampling and aggregation in Monitoring to reduce data volume.
Use scalable API Gateway solutions with autoscaling and rate limiting.
Interview Tips
Time: 10 minutes for requirements and clarifications, 15 minutes for architecture and components, 10 minutes for scaling and trade-offs, 10 minutes for Q&A.
Clarify routing criteria and traffic splitting needs early.
Explain choice of proxy/router technology and dynamic config management.
Discuss how zero downtime config updates are achieved.
Highlight observability and security considerations.
Address scaling bottlenecks with concrete solutions.

Practice

(1/5)
1. What is the main purpose of traffic routing in microservices architecture?
easy
A. To direct incoming requests to specific services based on rules
B. To store data persistently across services
C. To encrypt communication between services
D. To monitor service health and uptime

Solution

  1. Step 1: Understand traffic routing

    Traffic routing means sending requests to the right service based on rules like URL path or user type.
  2. Step 2: Identify the main purpose

    Routing helps control where requests go, ensuring they reach the correct microservice.
  3. Final Answer:

    To direct incoming requests to specific services based on rules -> Option A
  4. Quick Check:

    Routing = directing requests [OK]
Hint: Routing means sending requests to the right place [OK]
Common Mistakes:
  • Confusing routing with data storage
  • Thinking routing encrypts data
  • Mixing routing with monitoring
2. Which of the following is a correct way to define a traffic splitting rule in a service mesh configuration?
easy
A. split: - weight: 50 service: v1 - weight: 50 service: v2
B. route: path: /api service: v1
C. split: - service: v1 - service: v2 - weight: 100
D. route: weight: 100 service: v1 path: /home

Solution

  1. Step 1: Understand traffic splitting syntax

    Traffic splitting uses weights to divide requests between service versions, e.g., 50% to v1 and 50% to v2.
  2. Step 2: Identify correct syntax

    split: - weight: 50 service: v1 - weight: 50 service: v2 correctly assigns weights to services for splitting. Other options mix routing and splitting or have invalid weight placement.
  3. Final Answer:

    split: - weight: 50 service: v1 - weight: 50 service: v2 -> Option A
  4. Quick Check:

    Splitting uses weights per service [OK]
Hint: Splitting needs weights assigned to each service [OK]
Common Mistakes:
  • Confusing routing rules with splitting rules
  • Missing weights in splitting definitions
  • Placing weights outside service entries
3. Given this traffic splitting configuration, what percentage of requests go to service v2?
split:
  - weight: 70
    service: v1
  - weight: 30
    service: v2
medium
A. 100%
B. 70%
C. 50%
D. 30%

Solution

  1. Step 1: Read the weights for each service

    Service v1 has weight 70, and service v2 has weight 30.
  2. Step 2: Calculate percentage for v2

    Total weight = 70 + 30 = 100. So, v2 gets 30/100 = 30% of requests.
  3. Final Answer:

    30% -> Option D
  4. Quick Check:

    Weight 30 means 30% traffic [OK]
Hint: Traffic % = service weight / total weight [OK]
Common Mistakes:
  • Adding weights incorrectly
  • Assuming equal split without weights
  • Confusing service names
4. You have this routing rule:
route:
  path: /user
  service: user-service-v1
  weight: 100
But requests to /user/profile are not reaching user-service-v1. What is the likely problem?
medium
A. Service name is incorrect and causes failure
B. Weight should be split between multiple services
C. The path rule matches only exact /user, not subpaths like /user/profile
D. Routing rules cannot use path matching

Solution

  1. Step 1: Analyze the path matching rule

    The rule matches exactly /user, but /user/profile is a subpath and may not match unless wildcard or prefix matching is used.
  2. Step 2: Identify why requests fail

    Since /user/profile does not match exactly /user, requests do not route to user-service-v1.
  3. Final Answer:

    The path rule matches only exact /user, not subpaths like /user/profile -> Option C
  4. Quick Check:

    Exact path matching excludes subpaths [OK]
Hint: Exact path matches exclude subpaths unless wildcard used [OK]
Common Mistakes:
  • Assuming weight must be split
  • Blaming service name without checking
  • Thinking routing ignores paths
5. You want to gradually roll out a new version of a payment service to 10% of users while keeping 90% on the old version. Which traffic management strategy is best suited for this?
hard
A. Use routing based on URL path to send 10% of requests to new service
B. Use traffic splitting with weights 90% to old and 10% to new service
C. Deploy both versions without traffic control and monitor errors
D. Use a load balancer that randomly sends requests without weights

Solution

  1. Step 1: Understand gradual rollout needs

    Gradual rollout means controlling what percentage of users see the new version.
  2. Step 2: Choose traffic management method

    Traffic splitting with weights allows precise control of request percentages to each version.
  3. Step 3: Evaluate other options

    Routing by URL path cannot split traffic by percentage. Random load balancing lacks control. Deploying without control risks all users seeing new version.
  4. Final Answer:

    Use traffic splitting with weights 90% to old and 10% to new service -> Option B
  5. Quick Check:

    Splitting controls rollout percentages [OK]
Hint: Use weighted splitting for gradual rollout [OK]
Common Mistakes:
  • Using URL path routing for percentage split
  • Ignoring traffic control during rollout
  • Relying on random load balancing