Bird
Raised Fist0
Microservicessystem_design~20 mins

Metrics collection (Prometheus) in Microservices - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Prometheus Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
💻 Command Output
intermediate
2:00remaining
Prometheus Query Output for HTTP Request Count
Given a Prometheus metric http_requests_total that counts all HTTP requests, what is the output of this query?

sum(rate(http_requests_total[5m]))

Assume the metric increments by 10 requests every minute consistently.
Microservices
sum(rate(http_requests_total[5m]))
A0.3333333
B2
C0.1666667
D10
Attempts:
2 left
💡 Hint
Think about how rate calculates per second over the 5-minute window.
Configuration
intermediate
2:00remaining
Prometheus Scrape Configuration for Multiple Targets
Which Prometheus scrape configuration correctly scrapes metrics from two microservices running on ports 8080 and 9090 on the same host?
A
scrape_configs:
  - job_name: 'microservices'
    static_configs:
      - targets: ['localhost:8080 localhost:9090']
B
scrape_configs:
  - job_name: 'microservices'
    static_configs:
      - targets: ['localhost:8080']
      - targets: ['localhost:9090']
C
scrape_configs:
  - job_name: 'microservices'
    static_configs:
      - targets: ['localhost:8080', 'localhost:9090']
D
scrape_configs:
  - job_name: 'microservices'
    static_configs:
      - targets: ['localhost:8080;localhost:9090']
Attempts:
2 left
💡 Hint
Targets should be a list of strings, each string is host:port.
Troubleshoot
advanced
2:00remaining
Diagnosing Missing Metrics in Prometheus
You configured Prometheus to scrape a microservice, but no metrics appear for it in the Prometheus UI. Which of the following is the most likely cause?
APrometheus is scraping metrics but the query used in UI is incorrect.
BPrometheus server is running but the disk is full.
CThe microservice is exposing metrics but Prometheus scrape interval is set to 1 hour.
DThe microservice is not exposing metrics on the configured endpoint.
Attempts:
2 left
💡 Hint
Check if the microservice endpoint for metrics is reachable and returns data.
🔀 Workflow
advanced
2:00remaining
Prometheus Alerting Workflow
Which sequence correctly describes the workflow for alerting when a microservice's error rate exceeds a threshold using Prometheus and Alertmanager?
A2,1,3,4
B1,2,3,4
C1,3,2,4
D3,1,2,4
Attempts:
2 left
💡 Hint
Think about the order from data collection to notification.
Best Practice
expert
2:00remaining
Optimizing Prometheus Metrics for High-Cardinality Labels
Which approach is best to reduce performance issues caused by high-cardinality labels in Prometheus metrics?
ARemove or limit labels that have many unique values, such as user IDs or session tokens.
BIncrease Prometheus scrape interval to reduce data points collected.
CStore all raw metrics in long-term storage without aggregation.
DAdd more labels to better identify each metric uniquely.
Attempts:
2 left
💡 Hint
High-cardinality labels create many time series, which can slow down Prometheus.

Practice

(1/5)
1. What is the main purpose of Prometheus in a microservices environment?
easy
A. To collect and store metrics from services for monitoring
B. To deploy microservices automatically
C. To manage user authentication
D. To serve web pages to users

Solution

  1. Step 1: Understand Prometheus role

    Prometheus is designed to collect numerical data called metrics from running services.
  2. Step 2: Identify monitoring purpose

    These metrics help monitor service health and performance in microservices.
  3. Final Answer:

    To collect and store metrics from services for monitoring -> Option A
  4. Quick Check:

    Prometheus = Metrics collection [OK]
Hint: Prometheus is for metrics, not deployment or auth [OK]
Common Mistakes:
  • Confusing Prometheus with deployment tools
  • Thinking Prometheus manages users
  • Assuming Prometheus serves web content
2. Which YAML configuration snippet correctly defines a Prometheus scrape job for a service at http://localhost:8080/metrics?
easy
A. jobs: - job: 'myservice' endpoints: ['localhost:8080']
B. scrape_configs: - job_name: 'myservice' static_configs: - targets: ['http://localhost:8080/metrics']
C. scrape_configs: - job_name: 'myservice' static_configs: - targets: ['localhost:8080']
D. scrape_jobs: - name: 'myservice' targets: ['localhost:8080/metrics']

Solution

  1. Step 1: Check Prometheus YAML syntax

    Prometheus uses scrape_configs with job_name and static_configs listing targets as host:port without URL path.
  2. Step 2: Validate target format

    Targets must be host:port only, no http:// or path like /metrics.
  3. Final Answer:

    scrape_configs: - job_name: 'myservice' static_configs: - targets: ['localhost:8080'] -> Option C
  4. Quick Check:

    Targets = host:port only [OK]
Hint: Targets list host:port only, no URL scheme or path [OK]
Common Mistakes:
  • Including http:// or /metrics in targets
  • Using wrong YAML keys like scrape_jobs or jobs
  • Misnaming job_name or static_configs
3. Given this Prometheus query: rate(http_requests_total[5m]), what does it calculate?
medium
A. The average rate of HTTP requests per second over the last 5 minutes
B. The current number of active HTTP requests
C. The total number of HTTP requests since service start
D. The maximum number of HTTP requests in the last 5 minutes

Solution

  1. Step 1: Understand rate() function

    The rate() function calculates the per-second average increase of a counter over a time window.
  2. Step 2: Apply to http_requests_total[5m]

    This means it measures how fast the total HTTP requests counter increased in the last 5 minutes, giving requests per second.
  3. Final Answer:

    The average rate of HTTP requests per second over the last 5 minutes -> Option A
  4. Quick Check:

    rate() = per-second average increase [OK]
Hint: rate() gives per-second average over time window [OK]
Common Mistakes:
  • Thinking rate() returns total count
  • Confusing rate() with current active requests
  • Assuming rate() returns max value
4. You configured Prometheus to scrape localhost:9090 but no metrics appear. Which fix is correct?
medium
A. Change target to localhost:9090/metrics in YAML
B. Remove job_name from config
C. Restart Prometheus to reload config
D. Add metrics_path: '/metrics' under the scrape job

Solution

  1. Step 1: Understand default metrics path

    Prometheus scrapes /metrics path by default, but if the service uses a different path, you must specify it.
  2. Step 2: Fix missing metrics path

    Adding metrics_path: '/metrics' explicitly tells Prometheus where to get metrics if not default or to confirm path.
  3. Final Answer:

    Add metrics_path: '/metrics' under the scrape job -> Option D
  4. Quick Check:

    metrics_path fixes scrape URL [OK]
Hint: Use metrics_path to set correct scrape URL path [OK]
Common Mistakes:
  • Adding path in targets instead of metrics_path
  • Restarting without config fix
  • Removing job_name breaks config
5. You want to monitor error rates in a microservice using Prometheus. The service exposes http_requests_total with labels status and method. Which query shows the error rate (status codes 500-599) over the last 10 minutes as a percentage of all requests?
hard
A. rate(http_requests_total{status=~"5.."}[10m]) / rate(http_requests_total[10m]) * 100
B. sum(rate(http_requests_total{status=~"5.."}[10m])) / sum(rate(http_requests_total[10m])) * 100
C. sum(rate(http_requests_total{status=~"5.."}[10m])) * 100
D. sum(rate(http_requests_total{status!~"5.."}[10m])) / sum(rate(http_requests_total[10m])) * 100

Solution

  1. Step 1: Filter error status codes 500-599

    Use regex status=~"5.." to select error codes in the 500 range.
  2. Step 2: Calculate error rate as percentage

    Sum the rate of error requests and divide by sum of all requests rate, then multiply by 100 for percentage.
  3. Final Answer:

    sum(rate(http_requests_total{status=~"5.."}[10m])) / sum(rate(http_requests_total[10m])) * 100 -> Option B
  4. Quick Check:

    Error rate % = error requests / total requests * 100 [OK]
Hint: Sum rates before division for correct percentage [OK]
Common Mistakes:
  • Dividing single rates instead of sums
  • Using wrong label regex
  • Multiplying before division