Bird
Raised Fist0
LLDsystem_design~7 mins

Availability checking in LLD - System Design Guide

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Problem Statement
When a system depends on external services or components, failures or downtime in those parts can cause the entire system to become unresponsive or crash. Without checking if these dependencies are available before use, the system may waste resources waiting or fail unexpectedly, leading to poor user experience and instability.
Solution
Availability checking involves proactively verifying if a service or component is reachable and responsive before attempting to use it. This can be done by sending lightweight requests or health checks and using the results to decide whether to proceed, retry, or fallback. This approach prevents cascading failures and improves system resilience by avoiding calls to unavailable parts.
Architecture
┌───────────────┐       ┌───────────────┐       ┌───────────────┐
│ Client System │──────▶│ Availability  │──────▶│ External      │
│               │       │ Checker       │       │ Service       │
└───────────────┘       └───────────────┘       └───────────────┘
         │                      │                       │
         │                      │◀──────────────────────┤
         │                      │   Health Check Result  │
         │◀─────────────────────┤                       │

This diagram shows a client system sending requests to an availability checker, which performs health checks on an external service before allowing the client to proceed.

Trade-offs
✓ Pros
Prevents system calls to unavailable services, reducing wasted resources.
Improves user experience by failing fast or using fallback options.
Helps isolate failures and avoid cascading system crashes.
✗ Cons
Adds latency due to extra health check requests.
Requires maintenance of health check logic and endpoints.
May produce false negatives if health checks are too strict or network is unstable.
Use when your system depends on external services or components that can fail or become unreachable, especially if those dependencies affect user-facing features or critical workflows.
Avoid if your system is fully self-contained with no external dependencies, or if the overhead of health checks outweighs the benefit at very low scale (e.g., under 100 requests per minute).
Real World Examples
Netflix
Netflix uses availability checking to monitor microservices before routing user requests, ensuring only healthy services receive traffic to maintain smooth streaming.
Uber
Uber performs availability checks on payment gateways and mapping services to quickly detect failures and switch to fallback options, preventing user transaction failures.
Amazon
Amazon checks availability of inventory and shipping services before confirming orders, avoiding order processing delays or errors.
Code Example
The before code calls the external service directly without checking if it is available, risking failures. The after code adds an AvailabilityChecker that sends a health check request before calling the service, preventing calls when the service is down.
LLD
### Before Availability Checking (naive call)
class ExternalServiceClient:
    def get_data(self):
        # Directly call external service without checking
        response = self.call_service()
        return response

    def call_service(self):
        # Simulate service call
        return "data"


### After Applying Availability Checking
import requests

class AvailabilityChecker:
    def __init__(self, health_url):
        self.health_url = health_url

    def is_available(self):
        try:
            response = requests.get(self.health_url, timeout=1)
            return response.status_code == 200
        except requests.RequestException:
            return False

class ExternalServiceClientWithCheck:
    def __init__(self, health_url):
        self.checker = AvailabilityChecker(health_url)

    def get_data(self):
        if not self.checker.is_available():
            raise Exception("Service unavailable")
        response = self.call_service()
        return response

    def call_service(self):
        # Simulate service call
        return "data"
OutputSuccess
Alternatives
Circuit Breaker
Circuit breaker not only checks availability but also stops calls to failing services temporarily after repeated failures, automatically recovering after a timeout.
Use when: Choose circuit breaker when you want to prevent repeated calls to failing services and automatically recover without manual intervention.
Retry Pattern
Retry pattern attempts to call a service multiple times after failure without proactively checking availability first.
Use when: Choose retry when failures are transient and you expect the service to recover quickly without needing explicit health checks.
Summary
Availability checking prevents system failures by verifying external services are reachable before use.
It improves resilience by avoiding calls to down services and enabling fallback strategies.
This pattern is essential when systems depend on unreliable or variable external components.

Practice

(1/5)
1. What is the main purpose of availability checking in system design?
easy
A. To create backups of system data
B. To increase the speed of data processing
C. To encrypt user data for security
D. To determine if a resource is free or ready to use

Solution

  1. Step 1: Understand the concept of availability checking

    Availability checking is about verifying if a resource like a room, item, or slot is free to be used or booked.
  2. Step 2: Identify the main goal

    The main goal is to know if the resource is ready or free, not about speed, security, or backups.
  3. Final Answer:

    To determine if a resource is free or ready to use -> Option D
  4. Quick Check:

    Availability checking = resource readiness [OK]
Hint: Availability checking means resource is free or not [OK]
Common Mistakes:
  • Confusing availability with performance optimization
  • Mixing availability with security features
  • Thinking availability means data backup
2. Which of the following code snippets correctly checks if a room is available given a list of booked rooms booked_rooms = [101, 102, 103] and a requested room requested_room = 104?
easy
A. if requested_room in booked_rooms: print('Available')
B. if requested_room == booked_rooms: print('Available')
C. if requested_room not in booked_rooms: print('Available')
D. if requested_room > booked_rooms: print('Available')

Solution

  1. Step 1: Understand the list and requested room

    booked_rooms contains rooms already taken: 101, 102, 103. requested_room is 104.
  2. Step 2: Check correct condition for availability

    Room is available if requested_room is NOT in booked_rooms. So, 'if requested_room not in booked_rooms' is correct.
  3. Final Answer:

    if requested_room not in booked_rooms: print('Available') -> Option C
  4. Quick Check:

    Not in booked_rooms means available [OK]
Hint: Check 'not in' to confirm availability [OK]
Common Mistakes:
  • Using 'in' instead of 'not in' to check availability
  • Comparing equality of a number to a list
  • Using greater than operator on list
3. Given the following code, what will be the output?
booked_slots = {"9AM": True, "10AM": False}
requested_slot = "10AM"
if not booked_slots.get(requested_slot, False):
    print("Slot Available")
else:
    print("Slot Booked")
medium
A. Slot Available
B. Slot Booked
C. KeyError
D. No output

Solution

  1. Step 1: Understand the dictionary and requested slot

    booked_slots maps times to True (booked) or False (free). "10AM" is False, meaning free.
  2. Step 2: Evaluate the condition

    booked_slots.get("10AM", False) returns False. 'not False' is True, so it prints "Slot Available".
  3. Final Answer:

    Slot Available -> Option A
  4. Quick Check:

    False means free, so output is Slot Available [OK]
Hint: False means free slot, so print available [OK]
Common Mistakes:
  • Assuming True means available instead of booked
  • Expecting KeyError when key exists
  • Ignoring default value in get()
4. Identify the bug in the following availability check code:
def is_available(stock, requested):
    if requested > stock:
        return True
    else:
        return False

print(is_available(5, 10))
medium
A. The function should return False when requested is greater than stock
B. The function is correct and returns True
C. The condition should be 'requested <= stock' to return True
D. The function should compare 'stock > requested' instead

Solution

  1. Step 1: Analyze the condition logic

    Current code returns True if requested > stock, meaning more requested than available stock.
  2. Step 2: Correct logic for availability

    Availability means stock should be enough or more than requested. So, if requested > stock, return False.
  3. Final Answer:

    The function should return False when requested is greater than stock -> Option A
  4. Quick Check:

    Requested > stock means not available [OK]
Hint: Availability means stock >= requested, else False [OK]
Common Mistakes:
  • Returning True when requested exceeds stock
  • Confusing greater than with less than
  • Not testing with example values
5. You are designing an availability checking system for a hotel booking platform. Which approach best ensures high availability and scalability when checking room availability in real-time?
hard
A. Use a centralized database with locking to check and update availability synchronously
B. Cache availability data in memory with periodic sync to the database and use optimistic concurrency
C. Check availability by scanning all booking records on every request without caching
D. Allow double booking and resolve conflicts manually later

Solution

  1. Step 1: Understand requirements for high availability and scalability

    System must respond quickly and handle many requests without blocking.
  2. Step 2: Evaluate options for real-time availability checking

    Cache availability data in memory with periodic sync to the database and use optimistic concurrency uses caching and optimistic concurrency, reducing database load and avoiding locks, improving scalability and availability.
  3. Final Answer:

    Cache availability data in memory with periodic sync to the database and use optimistic concurrency -> Option B
  4. Quick Check:

    Caching + optimistic concurrency = scalable availability [OK]
Hint: Cache data and use optimistic concurrency for scalable availability [OK]
Common Mistakes:
  • Using locking causing bottlenecks
  • Scanning all records causing slow response
  • Allowing double booking causing user issues