Bird
Raised Fist0
Microservicessystem_design~20 mins

Three pillars (metrics, logs, traces) in Microservices - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Three Pillars Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Understanding the role of metrics in microservices monitoring

Which of the following best describes the primary purpose of metrics in monitoring microservices?

AMetrics offer aggregated numerical data over time to track system performance and health.
BMetrics trace the path of a request through multiple services to identify latency bottlenecks.
CMetrics provide detailed records of individual events and errors for debugging purposes.
DMetrics store raw logs from all microservices for audit and compliance.
Attempts:
2 left
💡 Hint

Think about what numbers like CPU usage or request counts represent.

Architecture
intermediate
2:00remaining
Choosing the right pillar for debugging a failed request

You notice a user request failed in your microservices system. Which pillar should you consult first to find detailed information about the error and its context?

AMetrics, because they show aggregated error rates over time.
BTraces, because they provide numerical summaries of system health.
CLogs, because they contain detailed event records and error messages.
DMetrics and traces only, since logs are too large to be useful.
Attempts:
2 left
💡 Hint

Consider which pillar records detailed messages about what happened.

scaling
advanced
2:00remaining
Scaling trace collection in a high-traffic microservices environment

Your microservices system handles millions of requests per minute. You want to collect traces to understand request flows without overwhelming storage or processing. Which approach is best?

AOnly collect metrics and logs, ignoring traces to save resources.
BCollect traces for every request and store all data indefinitely.
CCollect traces only for requests that do not generate logs.
DSample a small percentage of requests for tracing and keep traces for a limited time.
Attempts:
2 left
💡 Hint

Think about balancing data volume and usefulness.

tradeoff
advanced
2:00remaining
Tradeoffs between logs and metrics for alerting

When setting up alerts for your microservices, what is a key tradeoff between using logs versus metrics?

ALogs provide fast, aggregated alerts but lack detail; metrics provide detailed alerts but are slow.
BMetrics enable fast, low-overhead alerts but may miss detailed context; logs provide rich detail but are costly to process in real-time.
CLogs and metrics both provide identical alerting capabilities with no tradeoffs.
DMetrics are only useful for debugging, while logs are only useful for performance monitoring.
Attempts:
2 left
💡 Hint

Consider speed and detail in alerting.

estimation
expert
3:00remaining
Estimating storage needs for logs, metrics, and traces

Your microservices system generates the following per minute: 10 million requests, 100 metrics per service per minute (50 services), and 1 log entry per request. Estimate the relative daily storage needed for logs, metrics, and traces assuming traces are collected for 1% of requests and each trace is 10 times the size of a log entry.

ALogs: 14 TB, Metrics: 720 MB, Traces: 1.4 TB
BLogs: 1.4 TB, Metrics: 720 GB, Traces: 14 TB
CLogs: 14 GB, Metrics: 720 MB, Traces: 140 GB
DLogs: 140 TB, Metrics: 7.2 GB, Traces: 14 TB
Attempts:
2 left
💡 Hint

Calculate logs as requests × log size × minutes per day; metrics as metrics × services × minutes per day; traces as 1% of requests × 10 × log size × minutes per day.

Practice

(1/5)
1. Which of the following best describes the role of metrics in microservices monitoring?
easy
A. They track the path of a request through multiple services.
B. They record detailed events and errors in the system.
C. They provide numerical data about system performance over time.
D. They store configuration settings for microservices.

Solution

  1. Step 1: Understand what metrics represent

    Metrics are numerical measurements like CPU usage, request counts, or latency that show system health over time.
  2. Step 2: Differentiate metrics from logs and traces

    Logs record events, traces follow request paths, but metrics summarize performance data.
  3. Final Answer:

    They provide numerical data about system performance over time. -> Option C
  4. Quick Check:

    Metrics = numerical performance data [OK]
Hint: Metrics = numbers about performance, not events or paths [OK]
Common Mistakes:
  • Confusing metrics with logs as event records
  • Thinking traces are numerical data
  • Assuming metrics store configurations
2. Which syntax correctly represents a log entry in a microservice system?
easy
A. [2024-06-01 12:00:00] ERROR Failed to connect
B. {"timestamp": "2024-06-01T12:00:00Z", "level": "ERROR", "message": "Failed to connect"}
C. Failed to connect
D. ERROR 2024-06-01T12:00:00Z Failed to connect

Solution

  1. Step 1: Identify standard log formats

    JSON format is widely used for structured logs in microservices for easy parsing and querying.
  2. Step 2: Compare options for correctness

    {"timestamp": "2024-06-01T12:00:00Z", "level": "ERROR", "message": "Failed to connect"} is a valid JSON log entry with timestamp, level, and message fields. Others are less structured or not JSON.
  3. Final Answer:

    {"timestamp": "2024-06-01T12:00:00Z", "level": "ERROR", "message": "Failed to connect"} -> Option B
  4. Quick Check:

    Structured JSON logs = {"timestamp": "2024-06-01T12:00:00Z", "level": "ERROR", "message": "Failed to connect"} [OK]
Hint: Logs are best as structured JSON for easy use [OK]
Common Mistakes:
  • Using unstructured plain text logs
  • Confusing XML-like logs with JSON
  • Ignoring timestamp or level fields
3. Given this trace data snippet for a request through three microservices, what is the total time spent processing the request?
{
  "traceId": "abc123",
  "spans": [
    {"service": "A", "duration_ms": 50},
    {"service": "B", "duration_ms": 30},
    {"service": "C", "duration_ms": 20}
  ]
}
medium
A. 100 ms
B. 50 ms
C. 30 ms
D. 20 ms

Solution

  1. Step 1: Understand trace spans and durations

    Each span shows time spent in a service. Total time is sum if services are sequential.
  2. Step 2: Sum durations of all spans

    50 ms + 30 ms + 20 ms = 100 ms total processing time.
  3. Final Answer:

    100 ms -> Option A
  4. Quick Check:

    Sum spans durations = 100 ms [OK]
Hint: Add all span durations for total trace time [OK]
Common Mistakes:
  • Taking only the longest span as total time
  • Ignoring some spans in calculation
  • Confusing traceId with duration
4. A developer notices that logs are missing trace IDs in a microservices system. What is the most likely cause?
medium
A. Services are using different programming languages.
B. Metrics collection is disabled.
C. Logs are stored in a different database.
D. Trace context is not propagated between services.

Solution

  1. Step 1: Understand trace ID propagation

    Trace IDs must be passed along service calls to link logs and traces.
  2. Step 2: Identify cause of missing trace IDs

    If trace context is not propagated, logs won't have trace IDs, breaking trace-log correlation.
  3. Final Answer:

    Trace context is not propagated between services. -> Option D
  4. Quick Check:

    Missing trace IDs = missing context propagation [OK]
Hint: Trace IDs must flow between services to appear in logs [OK]
Common Mistakes:
  • Confusing metrics with trace IDs
  • Assuming storage location causes missing IDs
  • Blaming programming language differences
5. You are designing a microservices system and want to implement the three pillars: metrics, logs, and traces. Which approach best ensures scalability and effective monitoring?
hard
A. Use a centralized monitoring system that collects metrics via Prometheus, logs via ELK stack, and traces via OpenTelemetry.
B. Store all logs and traces locally on each service to reduce network overhead.
C. Only collect metrics and ignore logs and traces to save storage space.
D. Send all raw logs and traces directly to the client application for analysis.

Solution

  1. Step 1: Identify best practices for scalable monitoring

    Centralized systems like Prometheus for metrics, ELK for logs, and OpenTelemetry for traces are industry standards for scalability and analysis.
  2. Step 2: Evaluate options for scalability and effectiveness

    Local storage limits analysis and scalability; ignoring logs/traces loses insights; sending raw data to clients is inefficient and insecure.
  3. Final Answer:

    Use a centralized monitoring system that collects metrics via Prometheus, logs via ELK stack, and traces via OpenTelemetry. -> Option A
  4. Quick Check:

    Centralized, specialized tools = scalable monitoring [OK]
Hint: Centralize collection with proven tools for all three pillars [OK]
Common Mistakes:
  • Storing logs/traces locally only
  • Ignoring logs or traces
  • Sending raw data directly to clients