Bird
Raised Fist0
Microservicessystem_design~10 mins

Services and networking in Microservices - Scalability & System Analysis

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Scalability Analysis - Services and networking
Growth Table: Services and Networking at Different Scales
ScaleNumber of ServicesNetwork TrafficLatencyService DiscoveryLoad BalancingSecurity
100 users5-10 small servicesLow, few calls per secondLow latency, simple direct callsStatic or simple DNSBasic round-robinSimple TLS, basic auth
10,000 users20-50 servicesModerate, hundreds of calls/secModerate latency, some retriesDynamic service registry (e.g., Consul)Software load balancers, health checksMutual TLS, token-based auth
1 million users100+ servicesHigh, thousands of calls/secHigher latency, circuit breakers neededRobust service mesh (e.g., Istio)Advanced load balancing, global LBZero trust, fine-grained policies
100 million usersHundreds of services, multi-regionVery high, millions of calls/secLatency critical, edge cachingGlobal service mesh, multi-clusterGeo-distributed LB, auto scalingAutomated security, compliance
First Bottleneck

At small scale, the network is simple and direct, so no major bottleneck.

At medium scale (~10K users), the first bottleneck is service discovery and load balancing. As the number of services and calls grow, static DNS or simple load balancers can't keep up with dynamic changes and health checks.

At large scale (~1M users), the network traffic volume and inter-service calls cause latency and overload. The bottleneck shifts to network bandwidth and the complexity of managing service-to-service communication securely and reliably.

At very large scale (~100M users), the bottleneck is global network coordination, multi-region latency, and security policy enforcement across clusters.

Scaling Solutions
  • Service Discovery: Move from static DNS to dynamic registries like Consul or Eureka, then to service meshes with built-in discovery.
  • Load Balancing: Start with simple round-robin, then software load balancers with health checks, and finally global load balancers with geo-routing.
  • Network Traffic: Use circuit breakers, retries, and rate limiting to reduce overload. Employ service mesh proxies to manage traffic efficiently.
  • Security: Implement TLS encryption, then mutual TLS, and finally zero-trust models with fine-grained policies enforced by the service mesh.
  • Multi-Region: Deploy services across regions with global service mesh and data replication to reduce latency and improve availability.
  • Monitoring and Observability: Use distributed tracing and metrics to detect bottlenecks early and optimize network paths.
Back-of-Envelope Cost Analysis
  • At 10,000 users, expect hundreds to thousands of inter-service calls per second. Each call adds network overhead and CPU load on proxies.
  • Network bandwidth: For 1,000 calls/sec with 10KB payload, bandwidth ~10MB/s (80Mbps), manageable on 1Gbps links.
  • At 1 million users, calls can reach tens of thousands per second, requiring multiple load balancers and service mesh proxies per cluster.
  • Storage for service registry data and logs grows linearly with services and calls; plan for scalable storage solutions.
  • Security overhead (encryption, auth) adds CPU cost; hardware acceleration or dedicated security proxies may be needed at scale.
Interview Tip

When discussing scalability of services and networking, start by defining the scale and traffic patterns.

Identify the first bottleneck clearly (e.g., service discovery or load balancing).

Explain how you would incrementally improve: dynamic discovery, load balancing, service mesh, security.

Use real numbers to justify your choices and show understanding of network overhead and latency.

Always mention monitoring and observability as key to managing complexity.

Self Check

Your service discovery system handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?

Answer: Upgrade from static or simple service discovery to a dynamic, scalable service registry or service mesh that can handle higher QPS with health checks and load balancing. This prevents stale or overloaded endpoints and reduces latency.

Key Result
As user count and service calls grow, the first bottleneck in microservices networking is service discovery and load balancing. Scaling requires moving from static DNS to dynamic registries and service meshes, plus advanced load balancing and security to handle high traffic and complexity.

Practice

(1/5)
1. What is the main purpose of service discovery in a microservices architecture?
easy
A. To manage database transactions
B. To store user data securely
C. To help services find and communicate with each other dynamically
D. To handle user authentication

Solution

  1. Step 1: Understand service discovery role

    Service discovery allows services to locate each other without hardcoding addresses.
  2. Step 2: Match purpose with options

    Only To help services find and communicate with each other dynamically describes dynamic service location, others relate to different concerns.
  3. Final Answer:

    To help services find and communicate with each other dynamically -> Option C
  4. Quick Check:

    Service discovery = dynamic service location [OK]
Hint: Service discovery = finding services automatically [OK]
Common Mistakes:
  • Confusing service discovery with authentication
  • Thinking it manages databases
  • Assuming it stores user data
2. Which of the following is the correct way to specify a REST API call in microservices networking?
easy
A. FETCH /users/api HTTP/1.0
B. POST /api/v1/users HTTP/1.1
C. CONNECT users /api/v1 HTTP/2
D. SEND /api/users HTTP/1.1

Solution

  1. Step 1: Identify standard HTTP methods and syntax

    REST APIs use standard HTTP methods like GET, POST, PUT, DELETE with URI paths and HTTP version.
  2. Step 2: Match correct syntax

    POST /api/v1/users HTTP/1.1 uses POST method, valid URI, and HTTP/1.1 version correctly; others use invalid methods or syntax.
  3. Final Answer:

    POST /api/v1/users HTTP/1.1 -> Option B
  4. Quick Check:

    REST API call = HTTP method + URI + version [OK]
Hint: REST calls use standard HTTP verbs and URIs [OK]
Common Mistakes:
  • Using non-standard HTTP methods
  • Incorrect URI format
  • Wrong HTTP version syntax
3. Given the following microservice call sequence:
Service A calls Service B via HTTP.
Service B calls Service C via gRPC.
Service C responds with data.

What is the correct order of communication protocols used in this flow?
medium
A. Only gRPC
B. gRPC then HTTP
C. Only HTTP
D. HTTP then gRPC

Solution

  1. Step 1: Trace the call sequence

    Service A calls B using HTTP first, then B calls C using gRPC.
  2. Step 2: Identify protocol order

    The communication starts with HTTP and then switches to gRPC.
  3. Final Answer:

    HTTP then gRPC -> Option D
  4. Quick Check:

    Call sequence protocols = HTTP then gRPC [OK]
Hint: Follow call chain to list protocols in order [OK]
Common Mistakes:
  • Mixing protocol order
  • Assuming only one protocol is used
  • Ignoring protocol differences
4. A microservice is failing to connect to another service using its hardcoded IP address. What is the most likely cause and fix?
medium
A. IP address changed; use service discovery instead of hardcoding
B. Service is down; restart the service
C. Network cable unplugged; check physical connections
D. Firewall blocking traffic; disable firewall

Solution

  1. Step 1: Identify problem with hardcoded IP

    Hardcoded IPs break when services move or scale, causing connection failures.
  2. Step 2: Recommend dynamic service discovery

    Using service discovery allows services to find current addresses dynamically, fixing the issue.
  3. Final Answer:

    IP address changed; use service discovery instead of hardcoding -> Option A
  4. Quick Check:

    Hardcoded IP failure = use service discovery [OK]
Hint: Avoid hardcoded IPs; use service discovery [OK]
Common Mistakes:
  • Restarting services without checking addresses
  • Ignoring dynamic environment changes
  • Disabling firewall without cause
5. You are designing a microservices system where services must communicate securely and efficiently. Which combination of networking components is best to ensure service discovery, secure communication, and load balancing?
hard
A. Service registry for discovery, TLS for security, and API gateway for load balancing
B. Static IPs for discovery, HTTP for communication, and DNS for load balancing
C. Manual config files for discovery, plain TCP sockets, and round-robin DNS
D. No discovery needed, use UDP for speed, and client-side load balancing

Solution

  1. Step 1: Identify components for service discovery

    A service registry dynamically tracks services, enabling discovery.
  2. Step 2: Choose secure communication and load balancing

    TLS encrypts data for security; API gateway can handle load balancing efficiently.
  3. Step 3: Evaluate other options

    Static IPs and manual configs lack flexibility; plain TCP and UDP lack security; DNS load balancing is limited.
  4. Final Answer:

    Service registry for discovery, TLS for security, and API gateway for load balancing -> Option A
  5. Quick Check:

    Discovery + TLS + API gateway = secure scalable system [OK]
Hint: Combine registry, TLS, and gateway for best networking [OK]
Common Mistakes:
  • Using static IPs instead of dynamic discovery
  • Ignoring encryption needs
  • Relying solely on DNS for load balancing