0
0
Microservicessystem_design~7 mins

Three pillars (metrics, logs, traces) in Microservices - System Design Guide

Choose your learning style9 modes available
Problem Statement
When microservices fail or behave unexpectedly, teams struggle to find the root cause quickly because data about system behavior is scattered or incomplete. Without a clear way to observe system health and diagnose issues, outages last longer and degrade user experience.
Solution
The three pillars—metrics, logs, and traces—work together to provide a complete picture of system health and behavior. Metrics give numeric summaries of system performance over time, logs record detailed events and errors, and traces show the path of requests across services. Together, they enable fast detection, diagnosis, and resolution of problems.
Architecture
Microservice
Instance
Metrics DB
Tracing
Instrumentation
Trace Storage

This diagram shows how microservices emit metrics, logs, and traces to their respective storage systems. Dashboards and alerting tools consume this data to provide observability.

Trade-offs
✓ Pros
Provides comprehensive observability by combining numeric data, detailed events, and request flows.
Enables faster root cause analysis by correlating metrics spikes with logs and traces.
Supports proactive alerting and capacity planning through metrics.
Improves understanding of distributed system behavior with traces.
✗ Cons
Requires additional infrastructure and storage for three different data types.
Increases complexity in data collection, processing, and correlation.
Needs careful design to avoid high overhead and data noise.
Use when operating distributed microservices at scale with complex interactions and the need for fast incident response and performance monitoring.
Avoid when running simple, monolithic applications with low traffic where the overhead of collecting and managing all three data types outweighs benefits.
Real World Examples
Netflix
Uses metrics to monitor streaming quality, logs for error details, and distributed tracing to track user requests across microservices for quick troubleshooting.
Uber
Combines metrics, logs, and traces to monitor ride requests and driver matching services, enabling rapid detection and resolution of latency issues.
Amazon
Employs the three pillars to maintain high availability of its e-commerce platform by correlating system health metrics with logs and traces from thousands of microservices.
Alternatives
Single pillar monitoring
Focuses on only one type of observability data, such as logs only or metrics only.
Use when: Use only metrics or logs when system complexity is low and full observability is not required.
Event-driven monitoring
Relies primarily on events and alerts rather than continuous metrics and traces.
Use when: Choose when system events are rare but critical, and detailed tracing is unnecessary.
Summary
The three pillars of observability are metrics, logs, and traces, each providing unique insights into system behavior.
Together, they enable fast detection and diagnosis of issues in complex microservices environments.
Implementing all three requires careful design to balance observability benefits with system overhead.