Microservicessystem_design~10 mins

Three pillars (metrics, logs, traces) in Microservices - Scalability & System Analysis

Choose your learning style10 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Scalability Analysis - Three pillars (metrics, logs, traces)

Growth Table: Scaling Observability in Microservices

Users/Traffic	Metrics	Logs	Traces
100 users	Basic CPU, memory, request counts collected on few services	Logs stored locally, simple text files, manual inspection	Traces sampled at low rate, few services instrumented
10K users	Centralized metrics collection with Prometheus or similar; alerting added	Logs shipped to central system (e.g., ELK stack); indexing starts	Distributed tracing enabled on key services; sampling rate increased
1M users	High cardinality metrics; long-term storage; aggregation and downsampling	Logs volume grows; need log retention policies and archiving; indexing optimized	Traces collected for most requests; storage and query performance optimized
100M users	Metrics sharded and federated; multi-tenant isolation; advanced anomaly detection	Logs stored in scalable object storage; cold and hot storage tiers; AI-based log analysis	Traces sampled intelligently; trace data linked with metrics and logs for root cause

First Bottleneck

At small scale, logs stored locally become hard to manage and search as volume grows.

At medium scale, centralized logging systems face storage and indexing bottlenecks due to high log volume.

At large scale, trace data storage and query performance degrade because traces are large and complex.

Overall, the first bottleneck is usually the logging infrastructure because logs grow fastest and require heavy indexing.

Scaling Solutions

Metrics: Use aggregation, downsampling, and sharding; employ time-series databases optimized for high cardinality.
Logs: Implement centralized log management with scalable storage (e.g., Elasticsearch clusters, cloud object storage); apply log retention and archiving policies; use indexing and compression.
Traces: Use sampling strategies to reduce volume; store traces in specialized databases; correlate traces with metrics and logs for efficient debugging.
General: Use horizontal scaling for collectors and storage; apply caching and tiered storage; automate alerting and anomaly detection.

Back-of-Envelope Cost Analysis

Assuming 1M users generating 10 requests/sec each:

Total requests: 10 million/sec
Metrics: 1-10 million data points/sec; requires high-throughput TSDB (e.g., Prometheus, Cortex)
Logs: Each request generates ~1KB logs -> ~10GB/sec raw logs; needs compression and tiered storage
Traces: Sampling 1% -> 100K traces/sec; each trace ~10KB -> ~1GB/sec storage
Network: High bandwidth needed for shipping logs and traces; consider local aggregation

Interview Tip

Structure your scalability discussion by:

Explaining the role of each pillar (metrics, logs, traces) in observability.
Describing how data volume grows with users and requests.
Identifying bottlenecks in storage, indexing, and query performance.
Suggesting concrete scaling solutions like sampling, sharding, and tiered storage.
Discussing trade-offs between data fidelity and cost.

Self Check

Your database handles 1000 QPS for logs. Traffic grows 10x to 10,000 QPS. What do you do first?

Answer: Implement log sampling or filtering to reduce volume, then scale the logging database horizontally with sharding or add replicas to handle increased write load.

Key Result

Logging infrastructure is the first bottleneck as log volume grows fastest; scaling requires sampling, sharding, and tiered storage across metrics, logs, and traces.

Practice

(1/5)

1. Which of the following best describes the role of metrics in microservices monitoring?

easy

A. They track the path of a request through multiple services.

B. They record detailed events and errors in the system.

C. They provide numerical data about system performance over time.

D. They store configuration settings for microservices.

Three pillars (metrics, logs, traces) in Microservices - Scalability & System Analysis

Start learning this pattern below

Practice

Solution

Step 1: Understand what metrics represent

Step 2: Differentiate metrics from logs and traces

Final Answer:

Quick Check:

Solution

Step 1: Identify standard log formats

Step 2: Compare options for correctness

Final Answer:

Quick Check:

Solution

Step 1: Understand trace spans and durations

Step 2: Sum durations of all spans

Final Answer:

Quick Check:

Solution

Step 1: Understand trace ID propagation

Step 2: Identify cause of missing trace IDs

Final Answer:

Quick Check:

Solution

Step 1: Identify best practices for scalable monitoring

Step 2: Evaluate options for scalability and effectiveness

Final Answer:

Quick Check: