| Users | Transactions/day | Data Size | System Changes |
|---|---|---|---|
| 100 | 1,000 | ~10 MB | Single server, simple DB, no caching needed |
| 10,000 | 100,000 | ~1 GB | DB indexing, read replicas, basic caching |
| 1,000,000 | 10,000,000 | ~100 GB | Sharded DB, distributed cache, horizontal app scaling |
| 100,000,000 | 1,000,000,000 | ~10 TB | Multi-region sharding, archival storage, CDN for UI data |
Transaction history in LLD - Scalability & System Analysis
Start learning this pattern below
Jump into concepts and practice - no test required
The database is the first bottleneck as transaction volume grows. It struggles with write throughput and query latency because transaction history requires frequent writes and complex queries for user statements.
- Read Replicas: Offload read queries to replicas to reduce load on primary DB.
- Caching: Use in-memory caches (e.g., Redis) for recent or frequent queries.
- Sharding: Split data by user ID or time range to distribute load across multiple DB instances.
- Horizontal Scaling: Add more application servers behind load balancers to handle increased traffic.
- Archival Storage: Move old transactions to cheaper, slower storage to keep main DB performant.
- CDN: Use for static UI assets and possibly precomputed reports to reduce server load.
- At 1M users with 10 transactions/day: 10M writes/day ≈ 115 writes/sec.
- Database must handle ~200 QPS (including reads).
- Storage: 100 GB for transaction data (assuming 10 KB per transaction).
- Network bandwidth: ~10 MB/s for data transfer (reads + writes).
- One DB instance can handle ~5,000 QPS, so single DB can handle writes but reads require replicas.
Start by estimating user and transaction volume. Identify the bottleneck (usually DB). Discuss scaling steps in order: caching, read replicas, sharding, horizontal app scaling. Mention trade-offs like consistency and latency. Use real numbers to justify choices.
Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?
Answer: Add read replicas and implement caching to reduce load on the primary database before considering sharding or adding more app servers.
Practice
transaction history in a system?Solution
Step 1: Understand the role of transaction history
Transaction history stores records of actions with details like timestamps and IDs.Step 2: Identify the correct purpose
This helps users and systems track past events clearly and reliably.Final Answer:
To record all important actions with details for tracking -> Option AQuick Check:
Transaction history purpose = record actions [OK]
- Confusing transaction history with caching
- Thinking it deletes data automatically
- Mixing it with security features like encryption
Solution
Step 1: Identify unique identifiers in transaction history
Unique transaction IDs ensure each record is distinct and traceable.Step 2: Compare options
Timestamps alone can repeat; user names and amounts are not unique identifiers.Final Answer:
Using a unique transaction ID -> Option BQuick Check:
Unique ID = unique transaction record [OK]
- Assuming timestamp alone is unique
- Using user name as unique key
- Using transaction amount as identifier
transactions = [
{"id": "t1", "time": "2024-01-01T10:00:00Z"},
{"id": "t2", "time": "2024-01-01T09:00:00Z"},
{"id": "t3", "time": "2024-01-01T11:00:00Z"}
]What is the correct order of transaction IDs if sorted by time ascending?
Solution
Step 1: Analyze timestamps for each transaction
t2 = 09:00, t1 = 10:00, t3 = 11:00 in UTC time.Step 2: Sort transactions by ascending time
Order is t2 (earliest), then t1, then t3 (latest).Final Answer:
["t2", "t1", "t3"] -> Option DQuick Check:
Sorted by time ascending = [t2, t1, t3] [OK]
- Sorting by ID instead of time
- Confusing ascending with descending order
- Ignoring timestamp format
def add_transaction(history, transaction):
if transaction['id'] not in [t['id'] for t in history]:
history.append(transaction)
else:
print("Duplicate transaction")
history = [{"id": "t1"}]
add_transaction(history, {"id": "t1"})What is the output when running this code?
Solution
Step 1: Check if transaction ID exists in history
The code checks if 't1' is already in the list of IDs in history.Step 2: Since 't1' exists, print duplicate message
The else branch runs and prints "Duplicate transaction".Final Answer:
Duplicate transaction -> Option AQuick Check:
Duplicate ID detected = print message [OK]
- Assuming transaction is added anyway
- Expecting an exception instead of print
- Confusing list comprehension syntax
Solution
Step 1: Consider scalability and retrieval speed
Scanning one big list or files without index is slow for millions of users.Step 2: Use database indexing on user ID and timestamp
This allows fast queries to get transactions per user sorted by time efficiently.Step 3: Avoid in-memory only storage for persistence and scale
Memory-only storage risks data loss and limits scale.Final Answer:
Use a database with an index on user ID and timestamp -> Option CQuick Check:
Indexing = fast retrieval at scale [OK]
- Scanning large lists for each query
- Ignoring indexing benefits
- Relying on memory-only storage
