Bird
Raised Fist0
LLDsystem_design~10 mins

Cancellation and refund policy in LLD - Scalability & System Analysis

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Scalability Analysis - Cancellation and refund policy
Growth Table: Cancellation and Refund Policy System
ScaleUsersRequests per Second (RPS)Data StorageSystem Changes
Small100 users~10 RPSFew MBs (policy rules, logs)Single server, simple DB, no caching
Medium10,000 users~1,000 RPSGBs (policy versions, user requests)DB read replicas, caching, load balancer
Large1,000,000 users~50,000 RPSTBs (logs, audit trails, refunds)Sharded DB, distributed cache, microservices
Very Large100,000,000 users~5,000,000 RPSPetabytes (archived data, analytics)Multi-region deployment, CDN, event-driven architecture
First Bottleneck

At small to medium scale, the database becomes the first bottleneck. This is because all cancellation and refund requests require consistent reads and writes to policy data and transaction records. The DB must handle many concurrent queries and updates, especially during peak refund periods.

Scaling Solutions
  • Database Scaling: Use read replicas to distribute read load. Implement connection pooling to manage DB connections efficiently.
  • Caching: Cache static policy rules and frequently accessed refund statuses to reduce DB hits.
  • Horizontal Scaling: Add more application servers behind a load balancer to handle increased request volume.
  • Sharding: Partition the database by user ID or region to distribute write load and improve performance.
  • Event-Driven Architecture: Use message queues to process refunds asynchronously, reducing synchronous DB load.
  • Multi-Region Deployment: Deploy services closer to users to reduce latency and distribute traffic.
Back-of-Envelope Cost Analysis
  • At 10,000 users with ~1,000 RPS, assuming each request is 1 KB, bandwidth needed is ~1 MB/s.
  • Storage for policy data and logs grows from MBs to GBs as users increase.
  • Refund processing requires additional compute resources; asynchronous processing reduces peak load.
  • Database must handle up to 1,000 QPS at medium scale; plan for replicas and sharding accordingly.
Interview Tip

Start by identifying key components: user requests, policy data, refund transactions. Discuss expected load and data growth. Identify bottlenecks early, usually the database. Propose scaling solutions step-by-step: caching, read replicas, horizontal scaling, sharding, and asynchronous processing. Always justify why each solution fits the bottleneck.

Self Check

Your database handles 1,000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?

Answer: Add read replicas to distribute read queries and implement caching for static policy data to reduce database load before considering sharding or more complex solutions.

Key Result
The database is the first bottleneck as user and request volume grows; scaling starts with read replicas and caching, then moves to sharding and asynchronous processing for large-scale cancellation and refund systems.

Practice

(1/5)
1. What is the primary purpose of a cancellation and refund policy in a system?
easy
A. To define rules for stopping services and returning money
B. To increase the price of products
C. To track user login times
D. To manage database backups

Solution

  1. Step 1: Understand the role of cancellation policies

    Cancellation and refund policies set clear rules about when and how users can stop services and get money back.
  2. Step 2: Eliminate unrelated options

    Options about pricing, login times, or backups do not relate to cancellation or refunds.
  3. Final Answer:

    To define rules for stopping services and returning money -> Option A
  4. Quick Check:

    Cancellation policy = service stop rules [OK]
Hint: Cancellation policies define service stop and refund rules [OK]
Common Mistakes:
  • Confusing cancellation policy with pricing strategy
  • Thinking it manages user authentication
  • Assuming it handles technical backups
2. Which of the following is a correct component to include in a cancellation policy data model?
easy
A. login_attempts: int
B. user_password: string
C. product_price: float
D. allowed_cancellation_time: datetime

Solution

  1. Step 1: Identify relevant data for cancellation policy

    The allowed cancellation time defines until when a user can cancel and get a refund.
  2. Step 2: Exclude unrelated fields

    User password, product price, and login attempts are unrelated to cancellation timing.
  3. Final Answer:

    allowed_cancellation_time: datetime -> Option D
  4. Quick Check:

    Cancellation policy needs cancellation time [OK]
Hint: Cancellation policy needs allowed cancellation time field [OK]
Common Mistakes:
  • Including unrelated user or product fields
  • Confusing cancellation time with login data
  • Using incorrect data types for time
3. Given this pseudocode for refund calculation:
if cancellation_time <= allowed_cancellation_time:
    refund_amount = full_price
else:
    refund_amount = 0
print(refund_amount)

What will be printed if cancellation_time is after allowed_cancellation_time?
medium
A. Error
B. full_price
C. 0
D. null

Solution

  1. Step 1: Analyze the condition

    If cancellation_time is after allowed_cancellation_time, the else branch runs.
  2. Step 2: Determine refund amount

    In else, refund_amount is set to 0, so 0 will be printed.
  3. Final Answer:

    0 -> Option C
  4. Quick Check:

    Late cancellation = zero refund [OK]
Hint: Late cancellations get zero refund [OK]
Common Mistakes:
  • Assuming refund is full regardless of time
  • Expecting an error due to condition
  • Confusing variable names
4. Identify the bug in this refund policy code snippet:
def calculate_refund(cancellation_time, allowed_time, price):
    if cancellation_time > allowed_time:
        refund = price
    else:
        refund = 0
    return refund
medium
A. Price variable is not used
B. Refund is given after allowed time instead of before
C. Function does not return any value
D. Refund is always zero

Solution

  1. Step 1: Understand refund logic

    Refund should be given if cancellation_time is before or equal to allowed_time.
  2. Step 2: Check condition logic

    Current code gives refund if cancellation_time is after allowed_time, which is incorrect.
  3. Final Answer:

    Refund is given after allowed time instead of before -> Option B
  4. Quick Check:

    Refund condition reversed = bug [OK]
Hint: Refund condition must check cancellation before allowed time [OK]
Common Mistakes:
  • Reversing the refund condition
  • Ignoring return statement
  • Misusing price variable
5. You are designing a cancellation and refund system for an online booking platform. Which approach best balances user trust and system scalability?
hard
A. Allow partial refund based on how close cancellation is to booking time
B. Allow full refund anytime, no restrictions
C. Allow full refund only if cancellation is made 24 hours before booking time, else no refund
D. Never allow refunds to avoid complexity

Solution

  1. Step 1: Consider user trust

    Partial refunds based on cancellation timing show fairness and flexibility, building trust.
  2. Step 2: Consider system scalability

    Partial refund rules can be implemented with clear logic and scale well without manual intervention.
  3. Step 3: Evaluate other options

    Full refund anytime is costly; no refunds reduce trust; strict cutoff is less flexible.
  4. Final Answer:

    Allow partial refund based on how close cancellation is to booking time -> Option A
  5. Quick Check:

    Partial refund balances trust and scalability [OK]
Hint: Partial refunds balance fairness and system load best [OK]
Common Mistakes:
  • Choosing no refund which harms user trust
  • Allowing full refund anytime which is costly
  • Using strict cutoff without flexibility