Why distributed databases handle scale in DBMS Theory - Performance Analysis
Start learning this pattern below
Jump into concepts and practice - no test required
When databases grow large, how fast they handle data matters a lot.
We want to see how distributed databases manage more data and users efficiently.
Analyze the time complexity of this simplified distributed query process.
-- Assume data is split across 3 servers
SELECT * FROM users WHERE age > 30;
-- Query runs on each server in parallel
-- Results are combined and returned
This shows a query running on multiple servers at once, then merging results.
Look for repeated work done by the system.
- Primary operation: Each server scans its part of the data.
- How many times: Once per server, all running at the same time.
As total data grows, it is split across servers, so each server handles approximately n/k data items.
| Input Size (n) | Approx. Operations per Server |
|---|---|
| 10,000 | ~3,333 |
| 100,000 | ~33,333 |
| 1,000,000 | ~333,333 |
Pattern observation: Total work grows with data size, but each server's work grows slower because data is shared.
Time Complexity: O(n / k)
This means the work per server grows with data size divided by number of servers, so adding servers helps handle more data efficiently.
[X] Wrong: "Adding more servers always makes queries instantly faster."
[OK] Correct: Some work like combining results or network delays still take time, so speed improves but not infinitely.
Understanding how distributed databases split work helps you explain real systems that handle big data smoothly.
What if the data was not evenly split across servers? How would that affect the time complexity?
Practice
Solution
Step 1: Understand the concept of distributed databases
Distributed databases store data on many computers instead of just one.Step 2: Recognize how spreading data helps scale
Spreading data and workload means many machines share the work, so the system can handle more data and users.Final Answer:
Because they spread data and workload across multiple machines -> Option AQuick Check:
Distributed databases = spread data/workload = better scale [OK]
- Thinking a single powerful computer is enough
- Believing data stored in one place scales well
- Assuming limiting users improves scaling
Solution
Step 1: Identify how reliability is improved in distributed systems
Reliability means data is safe and accessible even if one machine fails.Step 2: Understand data replication
Replicating data means copying it to multiple machines, so if one fails, others still have the data.Final Answer:
They replicate data across multiple nodes -> Option BQuick Check:
Replication = data copies = better reliability [OK]
- Thinking storing data on one server improves reliability
- Confusing deleting data with reliability
- Believing restricting users improves reliability
Solution
Step 1: Understand capacity per node
Each node can handle 1000 queries per second.Step 2: Calculate total capacity by adding all nodes
4 nodes x 1000 queries = 4000 queries per second total capacity.Final Answer:
4000 queries per second -> Option CQuick Check:
4 x 1000 = 4000 queries/sec [OK]
- Using capacity of one node as total
- Dividing instead of multiplying
- Adding extra queries beyond node capacity
Solution
Step 1: Identify what causes poor scaling
Poor scaling happens if some nodes have too much data or work, causing bottlenecks.Step 2: Understand uneven data distribution
If data is not spread evenly, some nodes get overloaded while others are idle, hurting performance.Final Answer:
Data is not evenly distributed across nodes -> Option DQuick Check:
Uneven data = overloaded nodes = poor scaling [OK]
- Thinking more nodes always cause poor scaling
- Believing replication causes poor scaling
- Assuming multiple machines hurt scaling
Solution
Step 1: Understand the need to handle more users
More users mean more queries and data requests, requiring more processing power.Step 2: Identify how distributed databases handle increased load
Adding more nodes spreads the workload, so the system can handle more users without slowing down.Final Answer:
Adding more nodes to share the workload -> Option AQuick Check:
More nodes = shared workload = better scaling [OK]
- Thinking reducing replication improves scaling
- Believing one powerful server can handle all load
- Assuming limiting users is the best scaling method
