Sharding and partitioning in DBMS Theory - Time & Space Complexity
When we split a large database into smaller parts, it helps us handle data faster. We want to know how this splitting affects the time it takes to find or store data.
How does dividing data into shards or partitions change the work needed as data grows?
Analyze the time complexity of querying data in a sharded database.
-- Assume data is split into 4 shards
SELECT * FROM users WHERE user_id = 12345;
-- Query is routed to one shard based on user_id
-- Each shard holds roughly n/4 records
This query looks for a user in one shard instead of the whole database.
Look at what repeats when searching data.
- Primary operation: Searching records in one shard.
- How many times: Once per query, but only in one shard, not all.
As total data grows, each shard holds less data compared to searching all data at once.
| Input Size (n) | Approx. Operations per Shard |
|---|---|
| 10,000 | 2,500 |
| 100,000 | 25,000 |
| 1,000,000 | 250,000 |
Pattern observation: Operations grow with data size but are divided by the number of shards, reducing work per query.
Time Complexity: O(n/k)
This means the time to search grows with data size but is divided by the number of shards, making each search faster.
[X] Wrong: "Sharding always makes queries instant regardless of data size."
[OK] Correct: Sharding reduces data per search but the time still grows as total data grows; it just grows slower.
Understanding how splitting data affects search time shows you can design systems that handle growth well. This skill helps you explain how big systems stay fast as they get bigger.
"What if we increased the number of shards as data grows? How would that change the time complexity?"