0
0
Microservicessystem_design~25 mins

Service discovery concept in Microservices - System Design Exercise

Choose your learning style9 modes available
Design: Service Discovery System for Microservices
Design the service discovery mechanism including registration, lookup, and health checking. Out of scope: detailed microservice business logic, security/authentication mechanisms.
Functional Requirements
FR1: Automatically register new microservice instances when they start
FR2: Allow microservices to find and communicate with other services dynamically
FR3: Support health checks to remove unhealthy service instances
FR4: Handle service instance failures gracefully
FR5: Provide low latency service lookup
FR6: Support scaling to thousands of service instances
Non-Functional Requirements
NFR1: System must handle 10,000+ service instances
NFR2: Service lookup latency p99 < 50ms
NFR3: Availability target 99.9% uptime
NFR4: Support dynamic scaling of services without manual intervention
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
❓ Question 6
Key Components
Service Registry (centralized or distributed)
Health Check Service
Client-side Discovery Library
Load Balancer or API Gateway
Heartbeat or TTL mechanism for service liveness
Design Patterns
Client-side service discovery
Server-side service discovery
Push vs Pull health checks
Caching for reducing lookup latency
Leader election for distributed registry
Reference Architecture
          +-------------------+
          |   Service A       |
          | (Service Client)  |
          +---------+---------+
                    |
                    | Service Lookup Request
                    v
          +-------------------+          +-------------------+
          | Service Registry   |<--------| Service Instances  |
          | (e.g. Consul, etcd)|         | (Service B, C...)  |
          +---------+---------+          +---------+---------+
                    |                            |
                    | Health Check Heartbeats   |
                    +----------------------------
Components
Service Registry
Consul / etcd / ZooKeeper
Stores service instance information and provides lookup APIs
Service Instances
Microservices running in containers or VMs
Register themselves on startup and send health heartbeats
Health Check Service
Built-in or external health check mechanism
Monitors service instance health and updates registry
Client-side Discovery Library
SDK or client library integrated in services
Queries registry and selects healthy service instances
Load Balancer / API Gateway
Nginx, Envoy, or cloud load balancer
Optionally routes requests to discovered service instances
Request Flow
1. 1. Service instance starts and registers itself with the Service Registry.
2. 2. Service instance periodically sends health check heartbeats to the registry.
3. 3. Client service queries the Service Registry to discover available instances of a target service.
4. 4. Client-side discovery library selects a healthy instance from the registry response.
5. 5. Client sends request directly to the selected service instance.
6. 6. If a service instance fails health checks, the registry removes it from the available list.
7. 7. Clients receive updated service lists on next lookup or via cache invalidation.
Database Schema
Entities: - Service: {service_id (PK), service_name} - ServiceInstance: {instance_id (PK), service_id (FK), ip_address, port, status, last_heartbeat_timestamp} Relationships: - One Service has many ServiceInstances - ServiceInstance status updated by Health Check results
Scaling Discussion
Bottlenecks
Service Registry becomes a single point of failure or bottleneck at high scale
High frequency health checks increase load on registry and network
Clients querying registry too often causing latency and load
Stale or inconsistent service instance data due to network partitions
Solutions
Use a distributed, replicated service registry with leader election (e.g. Consul, etcd cluster)
Implement TTL-based registrations and push health checks to reduce polling
Add client-side caching with short TTL and exponential backoff for lookups
Use quorum-based writes and reads in registry to ensure consistency
Partition registry data by service or region to reduce load
Interview Tips
Time: Spend 10 minutes clarifying requirements and constraints, 20 minutes designing components and data flow, 10 minutes discussing scaling and trade-offs, 5 minutes summarizing.
Explain difference between client-side and server-side discovery
Discuss health check mechanisms and failure handling
Highlight importance of low latency and high availability
Mention caching strategies to reduce load
Describe how distributed registries maintain consistency and availability