Overview - Throughput, latency, and availability

What is it?

Throughput, latency, and availability are key measures to understand how well a system performs. Throughput is how much work a system can do in a given time. Latency is the delay before a system responds to a request. Availability is the percentage of time a system is up and working. Together, they help us know if a system is fast, reliable, and capable.

Why it matters

Without measuring throughput, latency, and availability, systems can become slow, unreliable, or overloaded without warning. This can frustrate users, cause lost business, or even failures in critical services like banking or healthcare. These metrics help engineers design systems that meet user needs and keep services running smoothly.

Where it fits

Before learning these, you should understand basic system components like servers and networks. After this, you can learn about scaling systems, load balancing, and fault tolerance to improve these metrics in real systems.

Mental Model

Core Idea

Throughput measures how much work a system does, latency measures how fast it responds, and availability measures how often it works.

Think of it like...

Imagine a busy restaurant: throughput is how many meals the kitchen can prepare per hour, latency is how long a customer waits for their meal after ordering, and availability is how often the restaurant is open for business.

┌───────────────┐   ┌───────────────┐   ┌───────────────┐
│ Throughput    │   │ Latency       │   │ Availability  │
│ (Work done)  │   │ (Response time)│   │ (Uptime %)    │
└──────┬────────┘   └──────┬────────┘   └──────┬────────┘
       │                   │                   │
       │                   │                   │
       ▼                   ▼                   ▼
  System Capacity     User Wait Time       System Reliability

Build-Up - 7 Steps

1

FoundationUnderstanding Throughput Basics

Concept: Throughput is the amount of work a system can complete in a given time.

Throughput tells us how many requests, transactions, or tasks a system can handle per second or minute. For example, a web server might handle 1000 page requests per second. It depends on resources like CPU, memory, and network speed.

Result

You can measure how busy a system is and if it can handle expected user demand.

Knowing throughput helps predict if a system can keep up with user load or if it needs more resources.

2

FoundationGrasping Latency Fundamentals

3

IntermediateExploring Availability Meaning

4

IntermediateBalancing Throughput and Latency

5

IntermediateMeasuring and Monitoring Metrics

6

AdvancedImproving Availability with Redundancy

7

ExpertLatency Variability and Its Impact

Under the Hood

Throughput depends on how many requests a system can process concurrently, limited by CPU, memory, disk, and network. Latency is affected by processing time, network delays, and queuing. Availability depends on hardware uptime, software stability, and recovery mechanisms like failover and backups. Systems use load balancers, caches, and replication to optimize these metrics.

Why designed this way?

These metrics were defined to quantify system performance in ways that matter to users and operators. Early systems focused on throughput, but as user experience became critical, latency and availability gained importance. Tradeoffs exist because resources are finite, and perfect availability is impossible, so designs balance cost and reliability.

┌───────────────┐       ┌───────────────┐       ┌───────────────┐
│   Requests    │──────▶│  System Core  │──────▶│   Responses   │
│ (Input Load)  │       │ (Processing)  │       │ (Output)      │
└──────┬────────┘       └──────┬────────┘       └──────┬────────┘
       │                       │                       │
       │                       │                       │
       ▼                       ▼                       ▼
  Throughput             Latency Delay           Availability
 (Quantity)             (Time Delay)             (Uptime %)

System components:
[Load Balancer] → [Servers] → [Databases]
Failures here affect availability.
Queues here affect latency.
Resource limits affect throughput.

Myth Busters - 4 Common Misconceptions

Quick: Does higher throughput always mean lower latency? Commit to yes or no.

Common Belief:Higher throughput always means the system responds faster (lower latency).

Tap to reveal reality

Quick: Is 99.9% availability the same as zero downtime? Commit to yes or no.

Common Belief:High availability percentages mean the system never goes down.

Tap to reveal reality

Quick: Does measuring uptime alone fully capture availability? Commit to yes or no.

Common Belief:Availability is just about how long the system is powered on and reachable.

Tap to reveal reality

Quick: Can latency spikes be ignored if average latency is low? Commit to yes or no.

Common Belief:Only average latency matters; occasional spikes are not important.

Tap to reveal reality

Expert Zone

1

Throughput limits are often not just hardware but software bottlenecks like locks or contention.

2

Availability calculations must consider not only planned downtime but also partial failures and degraded modes.

3

Latency tail percentiles (e.g., 99th percentile) are more meaningful than averages for real user experience.

When NOT to use

Focusing only on throughput, latency, and availability can miss other important metrics like consistency, durability, or security. For example, in financial systems, correctness and auditability may be more critical. Alternatives include CAP theorem tradeoffs and quality-of-service metrics.

Production Patterns

Real systems use load balancers to distribute requests, caching to reduce latency, replication for availability, and circuit breakers to prevent overload. Monitoring tools track these metrics continuously, triggering alerts and auto-scaling to maintain performance.

Connections

Load Balancing

Builds-on

Understanding throughput and latency helps design load balancers that distribute work evenly to optimize performance.

Fault Tolerance

Builds-on

Availability concepts connect directly to fault tolerance strategies that keep systems running despite failures.

Human Reaction Time (Psychology)

Analogy and cross-domain insight

Knowing how humans perceive delays helps engineers set latency targets that feel fast and responsive.

Common Pitfalls

#1Ignoring latency spikes and focusing only on average latency.

Wrong approach:Monitoring system reports average latency of 100ms and declares performance good, ignoring 99th percentile latency of 1 second.

Correct approach:Monitor and optimize tail latency metrics like 95th or 99th percentile to ensure consistent user experience.

Root cause:Misunderstanding that average latency hides variability and worst-case delays.

#2Assuming 100% uptime is achievable and not planning for failures.

Wrong approach:Designing a system without redundancy or failover, expecting no downtime ever.

Correct approach:Implement redundancy, backups, and failover to achieve high but realistic availability targets.

Root cause:Unrealistic expectations about hardware and software reliability.

#3Maximizing throughput by accepting unlimited queue growth.

Wrong approach:Allowing request queues to grow indefinitely to increase throughput without limits.

Correct approach:Use backpressure and rate limiting to keep queues manageable and latency low.

Root cause:Not understanding the tradeoff between throughput and latency.

Key Takeaways

Throughput, latency, and availability are fundamental metrics that describe how much work a system can do, how fast it responds, and how often it is operational.

Balancing these metrics is essential because improving one can negatively affect the others if not managed carefully.

Measuring these metrics accurately with the right tools is critical to maintaining system health and user satisfaction.

Advanced designs use redundancy, load balancing, and monitoring to optimize these metrics in real-world systems.

Understanding latency variability and realistic availability targets prevents common mistakes that degrade user experience and system reliability.