Design: Distributed Tracing System
Design covers trace data collection, storage, query, and visualization. Does not cover instrumentation libraries in detail or alerting systems.
Functional Requirements
FR1: Collect trace data from multiple microservices in a distributed system
FR2: Track requests as they flow through different services
FR3: Visualize the end-to-end request path with timing information
FR4: Support high throughput with minimal impact on service latency
FR5: Allow querying traces by trace ID, service name, or time range
FR6: Provide sampling to control data volume
FR7: Integrate with existing logging and monitoring tools
Non-Functional Requirements
NFR1: Handle up to 100,000 traces per second
NFR2: API response latency for trace queries under 500ms (p99)
NFR3: System availability of 99.9%
NFR4: Data retention for 7 days
NFR5: Minimal overhead on instrumented services (<5ms added latency per request)