Design: Metrics Collection System with Prometheus
Includes metrics collection, storage, querying, and alerting. Excludes detailed dashboard UI design and long-term archival beyond 15 days.
Functional Requirements
FR1: Collect real-time metrics from multiple microservices
FR2: Support scraping metrics at regular intervals (e.g., every 15 seconds)
FR3: Store metrics data efficiently for querying and alerting
FR4: Provide a dashboard for visualizing metrics
FR5: Support alerting based on defined thresholds
FR6: Handle up to 10,000 metrics per second from 100 microservices
Non-Functional Requirements
NFR1: Scrape latency should be under 5 seconds
NFR2: System availability should be 99.9%
NFR3: Storage retention for metrics data should be configurable (default 15 days)
NFR4: Minimal impact on microservices performance during metrics collection