0
0
Microservicessystem_design~7 mins

Correlation IDs in Microservices - System Design Guide

Choose your learning style9 modes available
Problem Statement
When multiple microservices handle parts of the same user request, it becomes nearly impossible to trace the full journey of that request across services. Without a shared identifier, debugging failures or performance issues requires sifting through disconnected logs, causing delays and errors in root cause analysis.
Solution
Correlation IDs assign a unique identifier to each user request at the entry point and pass this ID through all downstream services. Each service logs this ID alongside its own logs, enabling developers to trace the entire request flow end-to-end by filtering logs with the same correlation ID.
Architecture
Client/User
API Gateway
Service B

This diagram shows a client request entering through an API Gateway where a correlation ID is assigned. The ID flows through multiple services (A, B, C), allowing logs from all services to be linked by the same correlation ID.

Trade-offs
✓ Pros
Enables end-to-end tracing of requests across distributed services.
Simplifies debugging by linking logs from multiple services with a single ID.
Improves monitoring and alerting by correlating related events.
Supports distributed tracing tools and observability platforms.
✗ Cons
Requires all services to propagate the correlation ID consistently.
Adds slight overhead in request headers and logging.
Needs careful handling to avoid ID loss or duplication in asynchronous flows.
Use when your system has multiple microservices handling parts of the same user request, especially if you have more than 10 services or handle over 1000 requests per second.
Avoid if your system is a single monolith or has very simple request flows where tracing across services is unnecessary.
Real World Examples
Uber
Uber uses correlation IDs to trace ride requests across dozens of microservices, enabling quick diagnosis of delays or failures in the booking process.
Netflix
Netflix propagates correlation IDs through its streaming and recommendation services to monitor user sessions and troubleshoot playback issues.
Amazon
Amazon uses correlation IDs to track orders as they move through inventory, payment, and shipping microservices, ensuring smooth order fulfillment.
Code Example
The before code logs requests without any shared identifier, making it hard to trace. The after code assigns a unique correlation ID at the entry point and passes it explicitly to downstream services, which log it to enable tracing.
Microservices
### Before: No correlation ID propagation
import logging

def service_a(request):
    logging.info(f"Processing request {request}")
    # calls service_b without passing any ID
    service_b()

def service_b():
    logging.info("Processing in service B")


### After: Correlation ID propagation
import logging
import uuid

class RequestContext:
    correlation_id = None

def service_a(request):
    # Assign correlation ID if missing
    if not hasattr(request, 'correlation_id'):
        request.correlation_id = str(uuid.uuid4())
    logging.info(f"Processing request {request} with correlation_id={request.correlation_id}")
    service_b(request.correlation_id)

def service_b(correlation_id):
    logging.info(f"Processing in service B with correlation_id={correlation_id}")
OutputSuccess
Alternatives
Distributed Tracing
Distributed tracing extends correlation IDs by adding timing and metadata at each service hop to build a detailed trace.
Use when: Choose distributed tracing when you need detailed performance insights and latency breakdowns beyond simple request correlation.
Logging Context Propagation
Logging context propagation automatically attaches context like user ID or session ID to logs without a unique correlation ID.
Use when: Use when you want to enrich logs with context but do not require full request tracing across services.
Summary
Correlation IDs uniquely tag each user request to trace it across multiple microservices.
They enable easier debugging and monitoring by linking logs from different services.
Proper propagation and logging of correlation IDs are essential for effective observability.