Sharding and partitioning in DBMS Theory - Time & Space Complexity
Start learning this pattern below
Jump into concepts and practice - no test required
When we split a large database into smaller parts, it helps us handle data faster. We want to know how this splitting affects the time it takes to find or store data.
How does dividing data into shards or partitions change the work needed as data grows?
Analyze the time complexity of querying data in a sharded database.
-- Assume data is split into 4 shards
SELECT * FROM users WHERE user_id = 12345;
-- Query is routed to one shard based on user_id
-- Each shard holds roughly n/4 records
This query looks for a user in one shard instead of the whole database.
Look at what repeats when searching data.
- Primary operation: Searching records in one shard.
- How many times: Once per query, but only in one shard, not all.
As total data grows, each shard holds less data compared to searching all data at once.
| Input Size (n) | Approx. Operations per Shard |
|---|---|
| 10,000 | 2,500 |
| 100,000 | 25,000 |
| 1,000,000 | 250,000 |
Pattern observation: Operations grow with data size but are divided by the number of shards, reducing work per query.
Time Complexity: O(n/k)
This means the time to search grows with data size but is divided by the number of shards, making each search faster.
[X] Wrong: "Sharding always makes queries instant regardless of data size."
[OK] Correct: Sharding reduces data per search but the time still grows as total data grows; it just grows slower.
Understanding how splitting data affects search time shows you can design systems that handle growth well. This skill helps you explain how big systems stay fast as they get bigger.
"What if we increased the number of shards as data grows? How would that change the time complexity?"
Practice
sharding and partitioning in databases?Solution
Step 1: Understand partitioning
Partitioning splits data inside a single database into smaller parts for easier management and faster queries.Step 2: Understand sharding
Sharding spreads data across multiple servers or machines to handle very large datasets and improve performance.Final Answer:
Partitioning divides data within one database; sharding spreads data across multiple servers. -> Option BQuick Check:
Partitioning = single database, Sharding = multiple servers [OK]
- Confusing sharding with partitioning
- Thinking both are the same
- Assuming partitioning involves multiple servers
Solution
Step 1: Define horizontal partitioning
Horizontal partitioning means dividing a table by rows, so each partition has the same columns but different sets of rows.Step 2: Check options
Splitting a table into multiple tables with the same columns but different rows. matches this definition exactly, while others describe different concepts or unrelated actions.Final Answer:
Splitting a table into multiple tables with the same columns but different rows. -> Option AQuick Check:
Horizontal partitioning = split rows [OK]
- Mixing horizontal with vertical partitioning
- Thinking partitioning means backup
- Confusing rows with columns
Solution
Step 1: Identify the shard key and ranges
The sharding is based on the last digit of user ID: 0-3 on Server 1, 4-6 on Server 2, 7-9 on Server 3.Step 2: Find the last digit of user ID 27
The last digit of 27 is 7, which falls in the 7-9 range assigned to Server 3.Final Answer:
Server 3 -> Option AQuick Check:
User ID 27 ends with 7, so Server 3 [OK]
- Ignoring the last digit and guessing server
- Choosing all servers instead of one
- Mixing up the shard ranges
Solution
Step 1: Understand shard key role
The shard key determines how data is split across shards. A poor choice can cause uneven data distribution.Step 2: Analyze the problem
Uneven shard sizes causing slow queries usually mean the shard key is not distributing data evenly.Final Answer:
The shard key is not chosen properly, causing uneven data distribution. -> Option DQuick Check:
Uneven shards = bad shard key choice [OK]
- Blaming hardware without checking shard key
- Confusing sharding with partitioning issues
- Ignoring data distribution patterns
Solution
Step 1: Understand combining sharding and partitioning
Sharding splits data across servers; partitioning splits data inside each server for better management.Step 2: Analyze the best approach
Sharding by region spreads data geographically, and partitioning by customer type inside each shard improves query speed and organization.Final Answer:
Shard the database by region across servers, and within each server, partition data by customer type. -> Option CQuick Check:
Shard by region, partition by type inside servers [OK]
- Mixing up shard and partition levels
- Ignoring partitioning after sharding
- Thinking backup replaces sharding
