Bird
Raised Fist0
LLDsystem_design~7 mins

Emergency handling in LLD - System Design Guide

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Problem Statement
When unexpected errors or failures occur during program execution, the system can crash or behave unpredictably, causing loss of data or poor user experience. Without a structured way to handle these emergencies, the entire application may stop working or produce incorrect results.
Solution
Emergency handling uses structured code blocks to catch and manage errors gracefully. It allows the program to detect problems, respond appropriately, and continue running or shut down safely. This mechanism separates normal logic from error management, improving reliability and user trust.
Architecture
Start
Try block
(normal code)
Success path
End
Except block
(handle error)
Recovery or
Cleanup
End

This diagram shows the flow of normal execution inside a try block and how errors divert control to an except block for handling before continuing or ending.

Trade-offs
✓ Pros
Prevents application crashes by catching runtime errors.
Separates error handling from main logic, improving code clarity.
Allows graceful recovery or cleanup after failures.
Improves user experience by providing meaningful error messages.
✗ Cons
Overusing emergency handling can hide bugs if errors are silently caught.
Handling all exceptions broadly may mask specific issues needing fixes.
Adds some performance overhead due to error checking.
Use emergency handling in any system where runtime errors can occur, especially in user input processing, file operations, network calls, or external API usage. Essential when system uptime and data integrity are critical.
Avoid using emergency handling to control normal program flow or replace proper validation. Do not catch exceptions without proper handling or logging, as this can hide real problems.
Real World Examples
Netflix
Netflix uses emergency handling in their streaming service to catch network errors and retry connections without interrupting the user experience.
Stripe
Stripe employs emergency handling to manage payment gateway failures gracefully, ensuring transactions are retried or users are informed without crashing the payment flow.
Uber
Uber uses emergency handling to catch GPS or API errors in their app, allowing fallback mechanisms to maintain service continuity.
Code Example
The before code crashes if the file does not exist because it does not handle exceptions. The after code uses a try-except block to catch the FileNotFoundError, prints a friendly message, and returns None to allow the program to continue safely.
LLD
### Before: No emergency handling

def read_file(filename):
    with open(filename, 'r') as f:
        data = f.read()
    return data

print(read_file('missing.txt'))  # This will crash if file not found


### After: With emergency handling

def read_file(filename):
    try:
        with open(filename, 'r') as f:
            data = f.read()
        return data
    except FileNotFoundError as e:
        print(f"Error: File {filename} not found.")
        return None

print(read_file('missing.txt'))  # Prints error message and returns None
OutputSuccess
Alternatives
Return codes
Instead of exceptions, functions return status codes indicating success or failure, requiring manual checks after each call.
Use when: Use when working in low-level or performance-critical systems where exceptions are too costly or unavailable.
Error callbacks
Errors are handled via callback functions passed to asynchronous operations rather than synchronous try-except blocks.
Use when: Use in asynchronous or event-driven systems where errors must be handled after operations complete.
Summary
Emergency handling prevents crashes by catching and managing runtime errors.
It separates error management from normal code, improving reliability and user experience.
Proper use requires catching specific exceptions and avoiding misuse for normal control flow.

Practice

(1/5)
1. What is the primary goal of an emergency handling system in system design?
easy
A. To detect problems quickly and protect people and property
B. To increase system performance under normal conditions
C. To reduce the cost of hardware components
D. To provide detailed analytics for marketing purposes

Solution

  1. Step 1: Understand the purpose of emergency handling

    Emergency handling systems are designed to detect issues fast and act to prevent harm.
  2. Step 2: Identify the main goal

    The main goal is to protect people and property by quick detection and response.
  3. Final Answer:

    To detect problems quickly and protect people and property -> Option A
  4. Quick Check:

    Emergency handling = fast detection and protection [OK]
Hint: Focus on safety and speed in emergencies [OK]
Common Mistakes:
  • Confusing emergency handling with performance optimization
  • Thinking it is about cost reduction
  • Assuming it is for marketing analytics
2. Which component is NOT typically part of an emergency handling system?
easy
A. Safety action controller
B. Alerting system
C. Detection module
D. User interface for marketing

Solution

  1. Step 1: List typical components

    Emergency handling systems usually have detection, alerting, safety actions, and logging.
  2. Step 2: Identify the unrelated component

    User interface for marketing is unrelated to emergency handling functions.
  3. Final Answer:

    User interface for marketing -> Option D
  4. Quick Check:

    Marketing UI ≠ emergency handling component [OK]
Hint: Exclude marketing from emergency system parts [OK]
Common Mistakes:
  • Including unrelated business components
  • Confusing alerting with marketing notifications
  • Ignoring safety action controllers
3. Consider this simplified emergency system flow:
if sensor.detect(): alert.send(); safety.activate(); log.record()
What happens if sensor.detect() returns false?
medium
A. Alert, safety, and log actions all execute
B. Only alert and safety actions execute
C. No actions execute
D. Only log action executes

Solution

  1. Step 1: Analyze the if condition

    The actions alert.send(), safety.activate(), and log.record() run only if sensor.detect() is true.
  2. Step 2: Determine behavior when sensor.detect() is false

    If sensor.detect() returns false, the code block inside if does not run, so no actions execute.
  3. Final Answer:

    No actions execute -> Option C
  4. Quick Check:

    False detection = no emergency actions [OK]
Hint: If condition false means skip all inside actions [OK]
Common Mistakes:
  • Assuming log always runs regardless of detection
  • Thinking alert or safety run without detection
  • Confusing else behavior when none is given
4. In an emergency system, this code snippet causes a problem:
if sensor.detect():
alert.send()
safety.activate()
log.record()

What is the main issue?
medium
A. Missing indentation causes log.record() to run always
B. safety.activate() is outside the if block
C. alert.send() is not called properly
D. log.record() runs even if no detection

Solution

  1. Step 1: Check code indentation

    log.record() is not indented under the if, so it runs always.
  2. Step 2: Understand impact

    log.record() runs even when sensor.detect() is false, which is incorrect behavior.
  3. Final Answer:

    Missing indentation causes log.record() to run always -> Option A
  4. Quick Check:

    Indentation controls conditional execution [OK]
Hint: Indent all emergency actions inside detection check [OK]
Common Mistakes:
  • Ignoring indentation importance
  • Assuming all lines are inside if by default
  • Confusing which lines run conditionally
5. You design an emergency system that must alert multiple teams and log events reliably. Which design approach best ensures alerts are sent even if one alert service fails?
hard
A. Send alerts sequentially and stop on first failure
B. Send alerts in parallel with retries and fallback logging
C. Send alerts only to the primary team to reduce complexity
D. Log events only after all alerts succeed

Solution

  1. Step 1: Understand reliability needs

    To ensure alerts reach multiple teams, sending in parallel avoids blocking on one failure.
  2. Step 2: Use retries and fallback logging

    Retries help recover from temporary failures; fallback logging records failures for later review.
  3. Final Answer:

    Send alerts in parallel with retries and fallback logging -> Option B
  4. Quick Check:

    Parallel + retries = reliable alerting [OK]
Hint: Use parallel alerts with retries for reliability [OK]
Common Mistakes:
  • Stopping alerts on first failure
  • Ignoring retries and fallback mechanisms
  • Reducing alert recipients to simplify