Bird
Raised Fist0
Microservicessystem_design~25 mins

Service discovery concept in Microservices - System Design Exercise

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Design: Service Discovery System for Microservices
Design the service discovery mechanism including registration, lookup, and health checking. Out of scope: detailed microservice business logic, security/authentication mechanisms.
Functional Requirements
FR1: Automatically register new microservice instances when they start
FR2: Allow microservices to find and communicate with other services dynamically
FR3: Support health checks to remove unhealthy service instances
FR4: Handle service instance failures gracefully
FR5: Provide low latency service lookup
FR6: Support scaling to thousands of service instances
Non-Functional Requirements
NFR1: System must handle 10,000+ service instances
NFR2: Service lookup latency p99 < 50ms
NFR3: Availability target 99.9% uptime
NFR4: Support dynamic scaling of services without manual intervention
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
❓ Question 6
Key Components
Service Registry (centralized or distributed)
Health Check Service
Client-side Discovery Library
Load Balancer or API Gateway
Heartbeat or TTL mechanism for service liveness
Design Patterns
Client-side service discovery
Server-side service discovery
Push vs Pull health checks
Caching for reducing lookup latency
Leader election for distributed registry
Reference Architecture
          +-------------------+
          |   Service A       |
          | (Service Client)  |
          +---------+---------+
                    |
                    | Service Lookup Request
                    v
          +-------------------+          +-------------------+
          | Service Registry   |<--------| Service Instances  |
          | (e.g. Consul, etcd)|         | (Service B, C...)  |
          +---------+---------+          +---------+---------+
                    |                            |
                    | Health Check Heartbeats   |
                    +----------------------------
Components
Service Registry
Consul / etcd / ZooKeeper
Stores service instance information and provides lookup APIs
Service Instances
Microservices running in containers or VMs
Register themselves on startup and send health heartbeats
Health Check Service
Built-in or external health check mechanism
Monitors service instance health and updates registry
Client-side Discovery Library
SDK or client library integrated in services
Queries registry and selects healthy service instances
Load Balancer / API Gateway
Nginx, Envoy, or cloud load balancer
Optionally routes requests to discovered service instances
Request Flow
1. 1. Service instance starts and registers itself with the Service Registry.
2. 2. Service instance periodically sends health check heartbeats to the registry.
3. 3. Client service queries the Service Registry to discover available instances of a target service.
4. 4. Client-side discovery library selects a healthy instance from the registry response.
5. 5. Client sends request directly to the selected service instance.
6. 6. If a service instance fails health checks, the registry removes it from the available list.
7. 7. Clients receive updated service lists on next lookup or via cache invalidation.
Database Schema
Entities: - Service: {service_id (PK), service_name} - ServiceInstance: {instance_id (PK), service_id (FK), ip_address, port, status, last_heartbeat_timestamp} Relationships: - One Service has many ServiceInstances - ServiceInstance status updated by Health Check results
Scaling Discussion
Bottlenecks
Service Registry becomes a single point of failure or bottleneck at high scale
High frequency health checks increase load on registry and network
Clients querying registry too often causing latency and load
Stale or inconsistent service instance data due to network partitions
Solutions
Use a distributed, replicated service registry with leader election (e.g. Consul, etcd cluster)
Implement TTL-based registrations and push health checks to reduce polling
Add client-side caching with short TTL and exponential backoff for lookups
Use quorum-based writes and reads in registry to ensure consistency
Partition registry data by service or region to reduce load
Interview Tips
Time: Spend 10 minutes clarifying requirements and constraints, 20 minutes designing components and data flow, 10 minutes discussing scaling and trade-offs, 5 minutes summarizing.
Explain difference between client-side and server-side discovery
Discuss health check mechanisms and failure handling
Highlight importance of low latency and high availability
Mention caching strategies to reduce load
Describe how distributed registries maintain consistency and availability

Practice

(1/5)
1. What is the main purpose of service discovery in a microservices architecture?
easy
A. To help services find and communicate with each other automatically
B. To store user data securely
C. To manage database transactions
D. To handle user authentication

Solution

  1. Step 1: Understand the role of service discovery

    Service discovery allows microservices to locate each other dynamically without hardcoding addresses.
  2. Step 2: Identify the correct purpose

    It is not about data storage, transactions, or authentication but about service communication.
  3. Final Answer:

    To help services find and communicate with each other automatically -> Option A
  4. Quick Check:

    Service discovery = automatic service location [OK]
Hint: Service discovery = finding services automatically [OK]
Common Mistakes:
  • Confusing service discovery with data storage
  • Thinking it manages user authentication
  • Assuming it handles database transactions
2. Which of the following is a common component used in service discovery for microservices?
easy
A. Load balancer
B. Service registry
C. API gateway
D. Database shard

Solution

  1. Step 1: Identify components related to service discovery

    A service registry keeps track of available service instances and their locations.
  2. Step 2: Differentiate from other components

    Load balancers distribute traffic, API gateways manage requests, and database shards split data, but none perform service discovery.
  3. Final Answer:

    Service registry -> Option B
  4. Quick Check:

    Service registry = key for service discovery [OK]
Hint: Service registry stores service locations [OK]
Common Mistakes:
  • Confusing load balancer with service registry
  • Mixing API gateway with service discovery
  • Thinking database shards help find services
3. Consider this simplified service discovery flow:
1. Service A queries the registry for Service B's address.
2. Registry returns Service B's current IP and port.
3. Service A connects to Service B using the returned address.
4. Service B processes the request and responds.

What happens if Service B changes its IP but the registry is not updated?
medium
A. Service B will notify Service A directly
B. Service A will automatically find the new IP
C. The registry will redirect Service A to the new IP
D. Service A will connect to the old IP and fail

Solution

  1. Step 1: Analyze the flow when registry is outdated

    If the registry has an old IP, Service A uses that wrong address to connect.
  2. Step 2: Understand consequences of stale registry data

    Service A cannot find Service B at the old IP, so connection fails; no automatic update or redirection occurs.
  3. Final Answer:

    Service A will connect to the old IP and fail -> Option D
  4. Quick Check:

    Stale registry = failed connection [OK]
Hint: Outdated registry causes failed connections [OK]
Common Mistakes:
  • Assuming automatic IP update without registry refresh
  • Thinking services notify each other directly
  • Believing registry redirects requests automatically
4. A developer notices that service discovery is failing because services cannot find each other. The registry is running, but services do not register themselves. What is the most likely cause?
medium
A. The registry database is full
B. Network latency is too high
C. Services are not sending heartbeat or registration requests to the registry
D. Services are using incorrect API versions

Solution

  1. Step 1: Identify why services are missing in registry

    Services must actively register or send heartbeats to the registry to be discoverable.
  2. Step 2: Eliminate other causes

    Full database or network latency might cause delays but not complete absence; API version mismatch affects communication, not registration.
  3. Final Answer:

    Services are not sending heartbeat or registration requests to the registry -> Option C
  4. Quick Check:

    Missing registration = discovery failure [OK]
Hint: Services must register to be discoverable [OK]
Common Mistakes:
  • Blaming network latency for missing registrations
  • Assuming registry storage limits cause missing services
  • Confusing API version issues with registration problems
5. In a large microservices system with many instances starting and stopping frequently, which service discovery approach best supports scalability and fault tolerance?
hard
A. Using a centralized service registry with periodic health checks and automatic deregistration
B. Hardcoding service IPs in each microservice configuration
C. Using DNS-based service discovery without health checks
D. Relying on client-side caching of service addresses without updates

Solution

  1. Step 1: Evaluate scalability and fault tolerance needs

    Frequent changes require dynamic updates and health checks to avoid stale info and failures.
  2. Step 2: Compare approaches

    Centralized registry with health checks keeps accurate service info; hardcoding or caching causes stale data; DNS without health checks misses failures.
  3. Final Answer:

    Using a centralized service registry with periodic health checks and automatic deregistration -> Option A
  4. Quick Check:

    Dynamic registry + health checks = scalable, fault tolerant [OK]
Hint: Dynamic registry with health checks scales best [OK]
Common Mistakes:
  • Hardcoding IPs causing poor scalability
  • Ignoring health checks leading to stale data
  • Relying on caching without updates