Microservicessystem_design~25 mins

Why service mesh manages inter-service traffic in Microservices - Design It to Understand It

Choose your learning style10 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Design: Service Mesh for Microservices Inter-Service Traffic Management

Focus on managing inter-service traffic within a microservices architecture using a service mesh. Out of scope: service development, database design, external client communication.

Functional Requirements

FR1: Manage communication between multiple microservices securely and reliably

FR2: Provide observability for inter-service calls (metrics, tracing, logging)

FR3: Enable traffic control features like load balancing, retries, and circuit breaking

FR4: Support secure communication with mutual TLS authentication

FR5: Allow dynamic routing and version-based traffic splitting for deployments

Non-Functional Requirements

NFR1: Handle up to 10,000 inter-service requests per second

NFR2: Ensure p99 latency for inter-service calls under 100ms

NFR3: Achieve 99.9% availability for service communication

NFR4: Minimal impact on existing microservices code (no code changes preferred)

Think Before You Design

Questions to Ask

❓ Question 1

❓ Question 2

❓ Question 3

❓ Question 4

❓ Question 5

Key Components

Sidecar proxies deployed alongside each microservice

Control plane to manage configuration and policies

Certificate authority for mutual TLS

Telemetry collection and visualization tools

Service registry for service discovery

Design Patterns

Sidecar proxy pattern

Circuit breaker and retry patterns

Mutual TLS for secure communication

Canary deployment and traffic splitting

Observability with distributed tracing

Reference Architecture

Client Service A  <--->  Sidecar Proxy A  <--->  Service Mesh Control Plane
                             |                         |
                             v                         v
                      Sidecar Proxy B  <--->  Service B

- Each microservice runs with a sidecar proxy.
- Sidecars handle all incoming and outgoing traffic.
- Control plane configures proxies with routing, security, and telemetry rules.

Components

Sidecar Proxy

Envoy Proxy

Intercepts and manages all network traffic for a microservice, enabling features like load balancing, retries, and security.

Control Plane

Istio Control Plane

Manages configuration, policies, and certificates for sidecar proxies dynamically.

Certificate Authority

Istio CA or external PKI

Issues and rotates certificates for mutual TLS to secure inter-service communication.

Telemetry System

Prometheus, Grafana, Jaeger

Collects metrics, logs, and traces from sidecars for observability.

Service Registry

Kubernetes API Server or Consul

Keeps track of available services and their endpoints for discovery.

Request Flow

1. 1. Microservice A sends a request to Microservice B.

2. 2. The request is intercepted by Sidecar Proxy A.

3. 3. Sidecar Proxy A applies routing rules and security policies.

4. 4. Sidecar Proxy A establishes a mutual TLS connection to Sidecar Proxy B.

5. 5. Sidecar Proxy B receives the request and forwards it to Microservice B.

6. 6. Microservice B processes the request and sends the response back through Sidecar Proxy B.

7. 7. Sidecar Proxy B applies response policies and sends it securely to Sidecar Proxy A.

8. 8. Sidecar Proxy A forwards the response to Microservice A.

9. 9. Throughout this flow, telemetry data is collected and sent to the telemetry system.

10. 10. The control plane continuously updates sidecar proxies with configuration changes.

Database Schema

Not applicable as service mesh manages runtime traffic and configuration, not persistent data storage.

Scaling Discussion

Bottlenecks

Sidecar proxies becoming CPU or memory bottlenecks under high traffic

Control plane overwhelmed by frequent configuration updates

Certificate authority latency during certificate issuance or rotation

Telemetry system storage and query performance with large volumes of data

Solutions

Scale sidecar proxies horizontally by distributing microservices across nodes; optimize proxy resource limits

Implement control plane horizontal scaling and caching of configurations

Use efficient certificate rotation strategies and caching to reduce latency

Use scalable telemetry backends and sampling to reduce data volume

Interview Tips

Time: Spend 10 minutes understanding requirements and clarifying scope, 20 minutes designing the architecture and explaining components, 10 minutes discussing scaling and trade-offs, 5 minutes for questions.

Explain why sidecar proxies are used to manage traffic without changing microservice code

Describe how mutual TLS secures communication between services

Highlight observability benefits from telemetry collected by the mesh

Discuss traffic control features like retries, circuit breakers, and routing

Address scaling challenges and how to mitigate bottlenecks

Practice

(1/5)

1. Why does a service mesh manage inter-service traffic in a microservices architecture?

easy

A. To improve security, reliability, and observability between services

B. To replace the need for a database in microservices

C. To write the business logic inside each service

D. To increase the size of each service for better performance

Why service mesh manages inter-service traffic in Microservices - Design It to Understand It

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of service mesh

Step 2: Identify what service mesh does not do

Final Answer:

Quick Check:

Solution

Step 1: Understand sidecar proxy role

Step 2: Identify correct traffic flow

Final Answer:

Quick Check:

Solution

Step 1: Recognize retry feature in service mesh

Step 2: Identify which proxy handles retries

Final Answer:

Quick Check:

Solution

Step 1: Understand encryption in service mesh

Step 2: Identify common misconfiguration

Final Answer:

Quick Check:

Solution

Step 1: Identify service mesh features for failure handling

Step 2: Understand what service mesh does not do

Final Answer:

Quick Check: