Bird
Raised Fist0
Microservicessystem_design~7 mins

Service mesh concept in Microservices - System Design Guide

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Problem Statement
When microservices grow in number, managing communication between them becomes complex. Failures like lost requests, inconsistent retries, or security gaps happen because each service handles networking and observability differently. This leads to unreliable service-to-service communication and hard-to-debug issues.
Solution
A service mesh adds a dedicated infrastructure layer that manages all service-to-service communication. It uses lightweight proxies alongside each service to handle routing, retries, security, and monitoring uniformly. This separates communication logic from business code, making interactions reliable and observable without changing the services themselves.
Architecture
Service A
┌───────┐
Service B
Control
Plane

This diagram shows microservices each paired with a proxy that manages communication. The control plane configures and monitors these proxies to enforce policies and collect telemetry.

Trade-offs
✓ Pros
Centralizes communication features like retries, load balancing, and security without changing service code.
Improves observability by collecting detailed metrics and traces for all service interactions.
Enables fine-grained security policies such as mutual TLS between services.
Simplifies complex microservice networking with consistent behavior across services.
✗ Cons
Adds operational complexity and resource overhead due to sidecar proxies running alongside each service.
Increases latency slightly because all traffic passes through proxies.
Requires expertise to configure and maintain the control plane and proxies correctly.
When running dozens or more microservices that require secure, reliable, and observable communication at scale, especially in dynamic environments like Kubernetes.
When the system has fewer than 10 services or simple communication needs, as the added complexity and resource cost may outweigh benefits.
Real World Examples
Google
Developed Istio service mesh to manage complex service communication and security in their Kubernetes clusters.
Lyft
Created Envoy proxy as part of their service mesh to handle resilient service-to-service communication and observability.
IBM
Uses service mesh to enforce security policies and monitor microservices in hybrid cloud environments.
Alternatives
API Gateway
API Gateway manages external client-to-service traffic, while service mesh manages internal service-to-service communication.
Use when: When you need to control and secure traffic entering your system from outside clients.
Client-side Load Balancing
Client-side load balancing requires each service to implement communication logic, whereas service mesh centralizes this in proxies.
Use when: When you want simpler setups with fewer infrastructure components and can modify service code.
Summary
Service mesh manages communication between microservices using sidecar proxies and a control plane.
It improves reliability, security, and observability without changing application code.
Service mesh is best for large, complex microservice environments but adds operational overhead.

Practice

(1/5)
1. What is the main purpose of a service mesh in microservices architecture?
easy
A. To write application business logic
B. To store data for microservices
C. To replace microservices with monolithic applications
D. To manage communication between microservices securely and reliably

Solution

  1. Step 1: Understand the role of service mesh

    A service mesh handles how microservices talk to each other, focusing on communication.
  2. Step 2: Identify what service mesh does not do

    It does not store data, replace microservices, or write business logic.
  3. Final Answer:

    To manage communication between microservices securely and reliably -> Option D
  4. Quick Check:

    Service mesh = communication management [OK]
Hint: Service mesh controls microservice communication, not data or logic [OK]
Common Mistakes:
  • Confusing service mesh with data storage
  • Thinking service mesh replaces microservices
  • Assuming service mesh writes app code
2. Which of the following is a common tool used to implement a service mesh?
easy
A. Docker
B. Istio
C. Kubernetes
D. Git

Solution

  1. Step 1: Recall popular service mesh tools

    Istio, Linkerd, and Consul are well-known service mesh tools.
  2. Step 2: Differentiate from other tools

    Docker is for containers, Kubernetes for orchestration, Git for version control, not service mesh.
  3. Final Answer:

    Istio -> Option B
  4. Quick Check:

    Istio = service mesh tool [OK]
Hint: Istio is a popular service mesh tool, not Docker or Git [OK]
Common Mistakes:
  • Choosing Docker or Kubernetes as service mesh
  • Confusing version control tools with service mesh
  • Mixing container tools with service mesh tools
3. Given a microservices setup with Istio service mesh, what happens when a service-to-service call fails due to network issues?
medium
A. Istio retries the call automatically based on configured policies
B. The call fails immediately without retries
C. Istio shuts down the service permanently
D. The service mesh ignores the failure and logs no information

Solution

  1. Step 1: Understand Istio's retry feature

    Istio can automatically retry failed calls to improve reliability.
  2. Step 2: Eliminate incorrect behaviors

    Istio does not shut down services or ignore failures silently; it logs and manages retries.
  3. Final Answer:

    Istio retries the call automatically based on configured policies -> Option A
  4. Quick Check:

    Istio retries failed calls = true [OK]
Hint: Istio retries failed calls automatically if configured [OK]
Common Mistakes:
  • Assuming no retries happen on failure
  • Thinking Istio shuts down services on failure
  • Believing failures are ignored silently
4. You deployed a service mesh but notice that traffic between microservices is not encrypted. What is the most likely cause?
medium
A. The network cables are unplugged
B. The microservices are not running
C. Mutual TLS (mTLS) is not enabled in the service mesh configuration
D. The service mesh is not installed

Solution

  1. Step 1: Check encryption settings in service mesh

    Service mesh uses mutual TLS (mTLS) to encrypt traffic between services.
  2. Step 2: Identify why encryption might fail

    If mTLS is not enabled, traffic remains unencrypted despite service mesh presence.
  3. Final Answer:

    Mutual TLS (mTLS) is not enabled in the service mesh configuration -> Option C
  4. Quick Check:

    mTLS disabled = no encryption [OK]
Hint: Enable mTLS to encrypt service mesh traffic [OK]
Common Mistakes:
  • Assuming services not running causes no encryption
  • Thinking service mesh absence causes partial encryption
  • Ignoring mTLS setting importance
5. You want to add observability to your microservices without changing their code. How does a service mesh help achieve this?
hard
A. By injecting sidecar proxies that monitor and report traffic metrics transparently
B. By rewriting the microservices code to add logging
C. By replacing microservices with a single monolithic app
D. By disabling network communication between services

Solution

  1. Step 1: Understand sidecar proxy role in service mesh

    Service mesh injects sidecar proxies alongside microservices to handle communication and monitoring without code changes.
  2. Step 2: Eliminate incorrect options

    Service mesh does not rewrite code, replace microservices, or disable communication.
  3. Final Answer:

    By injecting sidecar proxies that monitor and report traffic metrics transparently -> Option A
  4. Quick Check:

    Sidecar proxies add observability without code change [OK]
Hint: Sidecar proxies add monitoring without changing app code [OK]
Common Mistakes:
  • Thinking code must be rewritten for observability
  • Confusing service mesh with app replacement
  • Assuming communication is disabled for observability