Microservicessystem_design~25 mins

Service discovery concept in Microservices - System Design Exercise

Choose your learning style10 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Design: Service Discovery System for Microservices

Design the service discovery mechanism including registration, lookup, and health checking. Out of scope: detailed microservice business logic, security/authentication mechanisms.

Functional Requirements

FR1: Automatically register new microservice instances when they start

FR2: Allow microservices to find and communicate with other services dynamically

FR3: Support health checks to remove unhealthy service instances

FR4: Handle service instance failures gracefully

FR5: Provide low latency service lookup

FR6: Support scaling to thousands of service instances

Non-Functional Requirements

NFR1: System must handle 10,000+ service instances

NFR2: Service lookup latency p99 < 50ms

NFR3: Availability target 99.9% uptime

NFR4: Support dynamic scaling of services without manual intervention

Think Before You Design

Questions to Ask

❓ Question 1

❓ Question 2

❓ Question 3

❓ Question 4

❓ Question 5

❓ Question 6

Key Components

Service Registry (centralized or distributed)

Health Check Service

Client-side Discovery Library

Load Balancer or API Gateway

Heartbeat or TTL mechanism for service liveness

Design Patterns

Client-side service discovery

Server-side service discovery

Push vs Pull health checks

Caching for reducing lookup latency

Leader election for distributed registry

Reference Architecture

          +-------------------+
          |   Service A       |
          | (Service Client)  |
          +---------+---------+
                    |
                    | Service Lookup Request
                    v
          +-------------------+          +-------------------+
          | Service Registry   |<--------| Service Instances  |
          | (e.g. Consul, etcd)|         | (Service B, C...)  |
          +---------+---------+          +---------+---------+
                    |                            |
                    | Health Check Heartbeats   |
                    +----------------------------

Components

Service Registry

Consul / etcd / ZooKeeper

Stores service instance information and provides lookup APIs

Service Instances

Microservices running in containers or VMs

Health Check Service

Built-in or external health check mechanism

Monitors service instance health and updates registry

Client-side Discovery Library

SDK or client library integrated in services

Queries registry and selects healthy service instances

Load Balancer / API Gateway

Nginx, Envoy, or cloud load balancer

Optionally routes requests to discovered service instances

Request Flow

1. 1. Service instance starts and registers itself with the Service Registry.

2. 2. Service instance periodically sends health check heartbeats to the registry.

3. 3. Client service queries the Service Registry to discover available instances of a target service.

4. 4. Client-side discovery library selects a healthy instance from the registry response.

5. 5. Client sends request directly to the selected service instance.

6. 6. If a service instance fails health checks, the registry removes it from the available list.

7. 7. Clients receive updated service lists on next lookup or via cache invalidation.

Database Schema

Entities: - Service: {service_id (PK), service_name} - ServiceInstance: {instance_id (PK), service_id (FK), ip_address, port, status, last_heartbeat_timestamp} Relationships: - One Service has many ServiceInstances - ServiceInstance status updated by Health Check results

Scaling Discussion

Bottlenecks

Service Registry becomes a single point of failure or bottleneck at high scale

High frequency health checks increase load on registry and network

Clients querying registry too often causing latency and load

Stale or inconsistent service instance data due to network partitions

Solutions

Use a distributed, replicated service registry with leader election (e.g. Consul, etcd cluster)

Implement TTL-based registrations and push health checks to reduce polling

Add client-side caching with short TTL and exponential backoff for lookups

Use quorum-based writes and reads in registry to ensure consistency

Partition registry data by service or region to reduce load

Interview Tips

Time: Spend 10 minutes clarifying requirements and constraints, 20 minutes designing components and data flow, 10 minutes discussing scaling and trade-offs, 5 minutes summarizing.

Explain difference between client-side and server-side discovery

Discuss health check mechanisms and failure handling

Highlight importance of low latency and high availability

Mention caching strategies to reduce load

Describe how distributed registries maintain consistency and availability

Practice

(1/5)

1. What is the main purpose of service discovery in a microservices architecture?

easy

A. To help services find and communicate with each other automatically

B. To store user data securely

C. To manage database transactions

D. To handle user authentication

Service discovery concept in Microservices - System Design Exercise

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of service discovery

Step 2: Identify the correct purpose

Final Answer:

Quick Check:

Solution

Step 1: Identify components related to service discovery

Step 2: Differentiate from other components

Final Answer:

Quick Check:

Solution

Step 1: Analyze the flow when registry is outdated

Step 2: Understand consequences of stale registry data

Final Answer:

Quick Check:

Solution

Step 1: Identify why services are missing in registry

Step 2: Eliminate other causes

Final Answer:

Quick Check:

Solution

Step 1: Evaluate scalability and fault tolerance needs

Step 2: Compare approaches

Final Answer:

Quick Check: