Design: Distributed Tracing System for Microservices
Design the tracing collection, storage, and visualization system. Instrumentation libraries and microservice code changes are out of scope.
Functional Requirements
FR1: Trace requests as they flow through multiple microservices
FR2: Collect timing and metadata for each service call
FR3: Visualize traces to identify latency and errors
FR4: Support high throughput with minimal overhead
FR5: Allow querying traces by trace ID, service, or time range
FR6: Integrate with existing microservices without major code changes
Non-Functional Requirements
NFR1: Handle up to 100,000 traces per second
NFR2: End-to-end trace latency under 500ms for visualization
NFR3: 99.9% system availability
NFR4: Minimal impact on microservice performance (less than 5% overhead)
NFR5: Data retention for 7 days