HLDsystem_design~7 mins

Throughput, latency, and availability in HLD - System Design Guide

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Problem Statement

When a system cannot handle enough requests per second, users experience slow responses or failures. If responses take too long, users get frustrated or abandon the service. If the system is often down or unreachable, users lose trust and stop using it.

Solution

Throughput measures how many requests a system can handle per second, ensuring capacity matches demand. Latency measures the time taken to respond to each request, keeping responses fast and smooth. Availability ensures the system stays up and reachable most of the time, minimizing downtime and failures.

Architecture

┌───────────────┐       ┌───────────────┐       ┌───────────────┐
│   Clients     │──────▶│ Load Balancer │──────▶│   Servers     │
└───────────────┘       └───────────────┘       └───────────────┘
        │                      │                       │
        │                      │                       │
        ▼                      ▼                       ▼
  Throughput: Requests   Latency: Response Time   Availability: Uptime

This diagram shows clients sending requests through a load balancer to servers. Throughput is the number of requests handled, latency is the response time, and availability is the system uptime.

Trade-offs

✓ Pros

→

Measuring throughput helps ensure the system can handle expected user load without bottlenecks.

→

Monitoring latency improves user experience by keeping response times low and predictable.

→

Ensuring high availability builds user trust by minimizing downtime and service interruptions.

✗ Cons

→

Optimizing for high throughput may increase latency if resources are stretched thin.

→

Reducing latency often requires more expensive hardware or caching, increasing costs.

→

Achieving very high availability can add complexity and cost due to redundancy and failover mechanisms.

Use these metrics when designing systems expected to serve thousands or more requests per second, require fast user interactions, and need to be reliable with minimal downtime.

For small-scale systems with very low traffic (under 100 requests per second) or non-critical applications where occasional downtime or delays are acceptable.

Real World Examples

Netflix

Netflix monitors throughput to handle millions of streaming requests, latency to keep video start times low, and availability to ensure users can watch anytime without interruption.

Amazon

Amazon tracks throughput to process thousands of orders per second, latency to keep page loads fast, and availability to maintain 24/7 shopping access.

Uber

Uber measures throughput to handle ride requests, latency to provide quick driver matching, and availability to keep the app operational globally.

Alternatives

Scalability

Scalability focuses on the system's ability to grow capacity, while throughput, latency, and availability are performance metrics to measure system behavior.

Use when: Choose scalability when planning for future growth beyond current throughput or latency limits.

Fault Tolerance

Fault tolerance ensures system continues working despite failures, directly supporting availability but focusing on error handling mechanisms.

Use when: Choose fault tolerance when system must remain operational despite hardware or software failures.

Summary

Throughput, latency, and availability are key metrics to measure system performance and reliability.

Throughput is about how many requests a system can handle, latency is about how fast it responds, and availability is about how often it stays up.

Balancing these metrics helps build systems that are fast, reliable, and scalable for real users.