Design: Metrics Collection System with Prometheus
Includes metrics collection, storage, querying, and alerting. Excludes detailed dashboard UI design and long-term archival beyond 15 days.
Functional Requirements
Non-Functional Requirements
Jump into concepts and practice - no test required
+----------------+
| Grafana UI |
+--------+-------+
|
v
+----------------+ +----+-----+ +--------------+
| Microservices | ---> | Prometheus| ---> | Alertmanager |
| (with Exporter)| | Server | +--------------+
+----------------+ +----+-----+
|
v
+------+-------+
| TSDB Storage |
+--------------+http://localhost:8080/metrics?scrape_configs with job_name and static_configs listing targets as host:port without URL path.rate(http_requests_total[5m]), what does it calculate?rate() functionrate() function calculates the per-second average increase of a counter over a time window.http_requests_total[5m]localhost:9090 but no metrics appear. Which fix is correct?/metrics path by default, but if the service uses a different path, you must specify it.metrics_path: '/metrics' explicitly tells Prometheus where to get metrics if not default or to confirm path.metrics_path: '/metrics' under the scrape job -> Option Dhttp_requests_total with labels status and method. Which query shows the error rate (status codes 500-599) over the last 10 minutes as a percentage of all requests?status=~"5.." to select error codes in the 500 range.