| Scale | Users | Transactions per Second (TPS) | Data Storage | Latency Requirements | System Changes |
|---|---|---|---|---|---|
| Small | 100 | 1-5 TPS | Few MBs (transaction logs) | ~1-2 seconds | Single app server, single DB instance, basic logging |
| Medium | 10,000 | 100-500 TPS | GBs (transaction history, user data) | <1 second | Load balancer, multiple app servers, DB read replicas, caching |
| Large | 1,000,000 | 5,000-10,000 TPS | TBs (full transaction history, audit logs) | <500 ms | Sharded DB, distributed cache, message queues, microservices |
| Very Large | 100,000,000 | 100,000+ TPS | Petabytes (archival storage, compliance data) | <200 ms | Multi-region deployment, event-driven architecture, advanced fraud detection, CDN for static content |
Payment handling in LLD - Scalability & System Analysis
At small to medium scale, the database is the first bottleneck. It struggles to handle increasing transaction writes and reads, especially with ACID compliance and consistency requirements.
As TPS grows, the application servers may also become CPU and memory constrained due to encryption, validation, and communication with payment gateways.
At very large scale, network bandwidth and data partitioning challenges arise, especially for cross-region consistency and compliance.
- Database scaling: Use read replicas to offload reads, implement sharding to distribute writes, and use connection pooling.
- Caching: Cache non-sensitive data like exchange rates or user preferences to reduce DB load.
- Horizontal scaling: Add more application servers behind a load balancer to handle more concurrent payment requests.
- Message queues: Use asynchronous processing for non-critical tasks like notifications or reporting to reduce latency.
- Microservices: Separate payment processing, fraud detection, and user management into services for independent scaling.
- Network and multi-region: Deploy services closer to users and use CDNs for static content to reduce latency.
- Security and compliance: Use encryption, tokenization, and PCI DSS compliant services to safely handle payment data.
- At 10,000 TPS, expect ~864 million transactions/day.
- Each transaction record ~1 KB -> ~864 GB/day storage needed before compression or archival.
- Network bandwidth: 10,000 TPS * 1 KB = ~10 MB/s sustained, plus overhead.
- Application servers: Each handles ~2,000 concurrent connections, so 5-10 servers needed at medium scale.
- Database: Single instance handles ~5,000 QPS; need replicas and sharding beyond that.
- Cloud costs scale with storage, compute, and network usage; optimize with caching and archiving.
Start by clarifying the expected scale and latency requirements.
Identify the critical components: payment gateway, database, app servers.
Discuss bottlenecks at each scale and propose targeted solutions.
Highlight security and compliance as non-negotiable constraints.
Use real numbers to justify your scaling choices.
Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first and why?
Answer: Add read replicas to distribute read load and implement connection pooling. For writes, consider sharding or partitioning to distribute write load. This addresses the database bottleneck before scaling app servers.
