| Scale | Users | Transactions per Second (TPS) | Data Storage | Latency Requirements | System Changes |
|---|---|---|---|---|---|
| Small | 100 | 1-5 TPS | Few MBs (transaction logs) | ~1-2 seconds | Single app server, single DB instance, basic logging |
| Medium | 10,000 | 100-500 TPS | GBs (transaction history, user data) | <1 second | Load balancer, multiple app servers, DB read replicas, caching |
| Large | 1,000,000 | 5,000-10,000 TPS | TBs (full transaction history, audit logs) | <500 ms | Sharded DB, distributed cache, message queues, microservices |
| Very Large | 100,000,000 | 100,000+ TPS | Petabytes (archival storage, compliance data) | <200 ms | Multi-region deployment, event-driven architecture, advanced fraud detection, CDN for static content |
Payment handling in LLD - Scalability & System Analysis
Start learning this pattern below
Jump into concepts and practice - no test required
At small to medium scale, the database is the first bottleneck. It struggles to handle increasing transaction writes and reads, especially with ACID compliance and consistency requirements.
As TPS grows, the application servers may also become CPU and memory constrained due to encryption, validation, and communication with payment gateways.
At very large scale, network bandwidth and data partitioning challenges arise, especially for cross-region consistency and compliance.
- Database scaling: Use read replicas to offload reads, implement sharding to distribute writes, and use connection pooling.
- Caching: Cache non-sensitive data like exchange rates or user preferences to reduce DB load.
- Horizontal scaling: Add more application servers behind a load balancer to handle more concurrent payment requests.
- Message queues: Use asynchronous processing for non-critical tasks like notifications or reporting to reduce latency.
- Microservices: Separate payment processing, fraud detection, and user management into services for independent scaling.
- Network and multi-region: Deploy services closer to users and use CDNs for static content to reduce latency.
- Security and compliance: Use encryption, tokenization, and PCI DSS compliant services to safely handle payment data.
- At 10,000 TPS, expect ~864 million transactions/day.
- Each transaction record ~1 KB -> ~864 GB/day storage needed before compression or archival.
- Network bandwidth: 10,000 TPS * 1 KB = ~10 MB/s sustained, plus overhead.
- Application servers: Each handles ~2,000 concurrent connections, so 5-10 servers needed at medium scale.
- Database: Single instance handles ~5,000 QPS; need replicas and sharding beyond that.
- Cloud costs scale with storage, compute, and network usage; optimize with caching and archiving.
Start by clarifying the expected scale and latency requirements.
Identify the critical components: payment gateway, database, app servers.
Discuss bottlenecks at each scale and propose targeted solutions.
Highlight security and compliance as non-negotiable constraints.
Use real numbers to justify your scaling choices.
Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first and why?
Answer: Add read replicas to distribute read load and implement connection pooling. For writes, consider sharding or partitioning to distribute write load. This addresses the database bottleneck before scaling app servers.
Practice
Solution
Step 1: Understand the role of payment handling
Payment handling systems focus on managing money transfers safely and reliably.Step 2: Identify the core function
The core function is to process payments securely and keep records of transactions.Final Answer:
To securely process and record financial transactions -> Option BQuick Check:
Payment handling = Secure transaction processing [OK]
- Confusing payment handling with user authentication
- Thinking it manages product display
- Assuming it stores user media files
Solution
Step 1: Identify logical payment flow order
First, payment details must be validated to ensure correctness.Step 2: Follow with processing, recording, and notifying
After validation, payment is processed, transaction recorded, then user notified.Final Answer:
Validate payment details -> Process payment -> Record transaction -> Notify user -> Option CQuick Check:
Payment flow = Validate -> Process -> Record -> Notify [OK]
- Not validating before processing
- Not recording transaction before notifying
- Mixing notification before processing
def process_payment(amount, card_info):
if not validate_card(card_info):
return "Invalid card"
if amount <= 0:
return "Invalid amount"
if not charge_card(card_info, amount):
return "Charge failed"
record_transaction(card_info, amount)
return "Payment successful"
What will be the output of
process_payment(100, 'expired_card') if validate_card returns False for expired cards?Solution
Step 1: Check card validation result
Sincevalidate_cardreturns False for expired cards, the first if condition triggers.Step 2: Return error message immediately
The function returns "Invalid card" without further processing.Final Answer:
"Invalid card" -> Option AQuick Check:
Expired card -> validate_card = False -> "Invalid card" [OK]
- Assuming charge_card runs despite invalid card
- Confusing invalid amount with invalid card
- Expecting success despite validation failure
Solution
Step 1: Identify cause of duplicate logs
Retries cause repeated transaction records without uniqueness checks.Step 2: Implement unique transaction IDs and check
Assign unique IDs and verify before logging to avoid duplicates.Final Answer:
Use unique transaction IDs and check before recording -> Option AQuick Check:
Unique IDs prevent duplicate transaction logs [OK]
- Ignoring duplicate checks on retries
- Removing logging which loses audit trail
- Increasing timeout doesn't fix duplicates
Solution
Step 1: Analyze scalability and latency needs
Handling 10,000 TPS requires distributing load and minimizing blocking.Step 2: Choose asynchronous distributed processing
Using a message queue with multiple workers allows parallel processing and reliability.Step 3: Eliminate options causing bottlenecks or insecurity
Single server or sequential DB processing causes bottlenecks; client-side processing lacks security.Final Answer:
Use a distributed message queue to process payments asynchronously with multiple worker nodes -> Option DQuick Check:
High TPS + low latency = distributed async processing [OK]
- Using single server causing bottlenecks
- Sequential DB processing slowing throughput
- Relying on client-side payment processing
