Bird
Raised Fist0
Microservicessystem_design~12 mins

Metrics collection (Prometheus) in Microservices - Architecture Diagram

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
System Overview - Metrics collection (Prometheus)

This system collects and monitors performance metrics from multiple microservices using Prometheus. It ensures real-time visibility into service health and resource usage to help maintain system reliability and performance.

Architecture Diagram
User
  |
  v
Load Balancer
  |
  v
API Gateway
  |
  v
+-------------------+       +----------------+
| Microservice A     |<----->| Prometheus     |
| (metrics endpoint) |       | (metrics store)|
+-------------------+       +----------------+
  |
  v
+-------------------+
| Microservice B     |
| (metrics endpoint) |
+-------------------+

Prometheus --> Grafana Dashboard

Components
User
user
End user interacting with the system
Load Balancer
load_balancer
Distributes incoming user requests evenly across API Gateway instances
API Gateway
api_gateway
Routes requests to appropriate microservices and handles authentication
Microservice A
service
Business logic service exposing metrics endpoint for Prometheus scraping
Microservice B
service
Another business logic service exposing metrics endpoint for Prometheus scraping
Prometheus
metrics_store
Scrapes metrics from microservices and stores them for querying
Grafana Dashboard
visualization_tool
Visualizes metrics data from Prometheus for monitoring and alerting
Request Flow - 6 Hops
UserLoad Balancer
Load BalancerAPI Gateway
API GatewayMicroservice A or B
PrometheusMicroservice A
PrometheusMicroservice B
PrometheusGrafana Dashboard
Failure Scenario
Component Fails:Prometheus
Impact:Metrics scraping stops, monitoring data becomes stale, alerts may not trigger
Mitigation:Set up Prometheus replicas with failover and alert on Prometheus health
Architecture Quiz - 3 Questions
Test your understanding
Which component is responsible for collecting metrics from microservices?
APrometheus
BAPI Gateway
CLoad Balancer
DGrafana Dashboard
Design Principle
This architecture uses a pull-based metrics collection model where Prometheus actively scrapes metrics endpoints exposed by microservices. This approach centralizes monitoring control, improves reliability, and decouples metrics collection from service logic.

Practice

(1/5)
1. What is the main purpose of Prometheus in a microservices environment?
easy
A. To collect and store metrics from services for monitoring
B. To deploy microservices automatically
C. To manage user authentication
D. To serve web pages to users

Solution

  1. Step 1: Understand Prometheus role

    Prometheus is designed to collect numerical data called metrics from running services.
  2. Step 2: Identify monitoring purpose

    These metrics help monitor service health and performance in microservices.
  3. Final Answer:

    To collect and store metrics from services for monitoring -> Option A
  4. Quick Check:

    Prometheus = Metrics collection [OK]
Hint: Prometheus is for metrics, not deployment or auth [OK]
Common Mistakes:
  • Confusing Prometheus with deployment tools
  • Thinking Prometheus manages users
  • Assuming Prometheus serves web content
2. Which YAML configuration snippet correctly defines a Prometheus scrape job for a service at http://localhost:8080/metrics?
easy
A. jobs: - job: 'myservice' endpoints: ['localhost:8080']
B. scrape_configs: - job_name: 'myservice' static_configs: - targets: ['http://localhost:8080/metrics']
C. scrape_configs: - job_name: 'myservice' static_configs: - targets: ['localhost:8080']
D. scrape_jobs: - name: 'myservice' targets: ['localhost:8080/metrics']

Solution

  1. Step 1: Check Prometheus YAML syntax

    Prometheus uses scrape_configs with job_name and static_configs listing targets as host:port without URL path.
  2. Step 2: Validate target format

    Targets must be host:port only, no http:// or path like /metrics.
  3. Final Answer:

    scrape_configs: - job_name: 'myservice' static_configs: - targets: ['localhost:8080'] -> Option C
  4. Quick Check:

    Targets = host:port only [OK]
Hint: Targets list host:port only, no URL scheme or path [OK]
Common Mistakes:
  • Including http:// or /metrics in targets
  • Using wrong YAML keys like scrape_jobs or jobs
  • Misnaming job_name or static_configs
3. Given this Prometheus query: rate(http_requests_total[5m]), what does it calculate?
medium
A. The average rate of HTTP requests per second over the last 5 minutes
B. The current number of active HTTP requests
C. The total number of HTTP requests since service start
D. The maximum number of HTTP requests in the last 5 minutes

Solution

  1. Step 1: Understand rate() function

    The rate() function calculates the per-second average increase of a counter over a time window.
  2. Step 2: Apply to http_requests_total[5m]

    This means it measures how fast the total HTTP requests counter increased in the last 5 minutes, giving requests per second.
  3. Final Answer:

    The average rate of HTTP requests per second over the last 5 minutes -> Option A
  4. Quick Check:

    rate() = per-second average increase [OK]
Hint: rate() gives per-second average over time window [OK]
Common Mistakes:
  • Thinking rate() returns total count
  • Confusing rate() with current active requests
  • Assuming rate() returns max value
4. You configured Prometheus to scrape localhost:9090 but no metrics appear. Which fix is correct?
medium
A. Change target to localhost:9090/metrics in YAML
B. Remove job_name from config
C. Restart Prometheus to reload config
D. Add metrics_path: '/metrics' under the scrape job

Solution

  1. Step 1: Understand default metrics path

    Prometheus scrapes /metrics path by default, but if the service uses a different path, you must specify it.
  2. Step 2: Fix missing metrics path

    Adding metrics_path: '/metrics' explicitly tells Prometheus where to get metrics if not default or to confirm path.
  3. Final Answer:

    Add metrics_path: '/metrics' under the scrape job -> Option D
  4. Quick Check:

    metrics_path fixes scrape URL [OK]
Hint: Use metrics_path to set correct scrape URL path [OK]
Common Mistakes:
  • Adding path in targets instead of metrics_path
  • Restarting without config fix
  • Removing job_name breaks config
5. You want to monitor error rates in a microservice using Prometheus. The service exposes http_requests_total with labels status and method. Which query shows the error rate (status codes 500-599) over the last 10 minutes as a percentage of all requests?
hard
A. rate(http_requests_total{status=~"5.."}[10m]) / rate(http_requests_total[10m]) * 100
B. sum(rate(http_requests_total{status=~"5.."}[10m])) / sum(rate(http_requests_total[10m])) * 100
C. sum(rate(http_requests_total{status=~"5.."}[10m])) * 100
D. sum(rate(http_requests_total{status!~"5.."}[10m])) / sum(rate(http_requests_total[10m])) * 100

Solution

  1. Step 1: Filter error status codes 500-599

    Use regex status=~"5.." to select error codes in the 500 range.
  2. Step 2: Calculate error rate as percentage

    Sum the rate of error requests and divide by sum of all requests rate, then multiply by 100 for percentage.
  3. Final Answer:

    sum(rate(http_requests_total{status=~"5.."}[10m])) / sum(rate(http_requests_total[10m])) * 100 -> Option B
  4. Quick Check:

    Error rate % = error requests / total requests * 100 [OK]
Hint: Sum rates before division for correct percentage [OK]
Common Mistakes:
  • Dividing single rates instead of sums
  • Using wrong label regex
  • Multiplying before division