0
0
LLDsystem_design~10 mins

Cancellation and refund policy in LLD - Scalability & System Analysis

Choose your learning style9 modes available
Scalability Analysis - Cancellation and refund policy
Growth Table: Cancellation and Refund Policy System
ScaleUsersRequests per Second (RPS)Data StorageSystem Changes
Small100 users~10 RPSFew MBs (policy rules, logs)Single server, simple DB, no caching
Medium10,000 users~1,000 RPSGBs (policy versions, user requests)DB read replicas, caching, load balancer
Large1,000,000 users~50,000 RPSTBs (logs, audit trails, refunds)Sharded DB, distributed cache, microservices
Very Large100,000,000 users~5,000,000 RPSPetabytes (archived data, analytics)Multi-region deployment, CDN, event-driven architecture
First Bottleneck

At small to medium scale, the database becomes the first bottleneck. This is because all cancellation and refund requests require consistent reads and writes to policy data and transaction records. The DB must handle many concurrent queries and updates, especially during peak refund periods.

Scaling Solutions
  • Database Scaling: Use read replicas to distribute read load. Implement connection pooling to manage DB connections efficiently.
  • Caching: Cache static policy rules and frequently accessed refund statuses to reduce DB hits.
  • Horizontal Scaling: Add more application servers behind a load balancer to handle increased request volume.
  • Sharding: Partition the database by user ID or region to distribute write load and improve performance.
  • Event-Driven Architecture: Use message queues to process refunds asynchronously, reducing synchronous DB load.
  • Multi-Region Deployment: Deploy services closer to users to reduce latency and distribute traffic.
Back-of-Envelope Cost Analysis
  • At 10,000 users with ~1,000 RPS, assuming each request is 1 KB, bandwidth needed is ~1 MB/s.
  • Storage for policy data and logs grows from MBs to GBs as users increase.
  • Refund processing requires additional compute resources; asynchronous processing reduces peak load.
  • Database must handle up to 1,000 QPS at medium scale; plan for replicas and sharding accordingly.
Interview Tip

Start by identifying key components: user requests, policy data, refund transactions. Discuss expected load and data growth. Identify bottlenecks early, usually the database. Propose scaling solutions step-by-step: caching, read replicas, horizontal scaling, sharding, and asynchronous processing. Always justify why each solution fits the bottleneck.

Self Check

Your database handles 1,000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?

Answer: Add read replicas to distribute read queries and implement caching for static policy data to reduce database load before considering sharding or more complex solutions.

Key Result
The database is the first bottleneck as user and request volume grows; scaling starts with read replicas and caching, then moves to sharding and asynchronous processing for large-scale cancellation and refund systems.