Design: Observability System for Microservices
Design focuses on the observability platform components for metrics, logs, and traces collection, storage, and visualization. It excludes microservices implementation and alerting rules creation.
Functional Requirements
FR1: Collect and store metrics from all microservices to monitor performance and resource usage
FR2: Collect and store logs from microservices for debugging and auditing
FR3: Collect and store distributed traces to understand request flows across services
FR4: Provide real-time dashboards and alerting based on metrics
FR5: Allow querying and searching logs efficiently
FR6: Visualize traces to identify latency bottlenecks
FR7: Support at least 1000 microservices generating data concurrently
FR8: Ensure data retention for 30 days for metrics and logs, 7 days for traces
Non-Functional Requirements
NFR1: System must handle ingestion of 1 million metrics data points per second
NFR2: Logs ingestion rate up to 500,000 log entries per second
NFR3: Trace data must have p99 latency under 5 seconds from generation to storage
NFR4: System availability target 99.9% uptime
NFR5: Data storage must be cost-effective and scalable
NFR6: APIs for querying must respond within 2 seconds for common queries