Hash-based sharding in MongoDB - Time & Space Complexity
When using hash-based sharding in MongoDB, we want to understand how the time to find or store data changes as the database grows.
We ask: How does the number of operations grow when we add more data?
Analyze the time complexity of this hash-based shard key lookup.
// Assume a collection sharded by hashed _id
const shardKey = hashFunction(document._id);
const shard = findShardForKey(shardKey);
const result = db.collection.find({ _id: document._id });
This code finds the shard by hashing the document's _id, then queries that shard for the document.
Look for repeated steps that affect performance.
- Primary operation: Hashing the shard key and locating the shard.
- How many times: Once per query or insert operation.
As the number of documents grows, the hash function still runs once per operation, and the shard lookup remains quick.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | ~1 hash + 1 shard lookup |
| 100 | ~1 hash + 1 shard lookup |
| 1000 | ~1 hash + 1 shard lookup |
Pattern observation: The number of operations stays about the same no matter how big the data is.
Time Complexity: O(1)
This means each query or insert takes about the same time regardless of how much data there is.
[X] Wrong: "Hash-based sharding slows down as data grows because it has to check many shards."
[OK] Correct: The hash function directly points to the correct shard, so only one shard is checked, keeping the operation fast.
Understanding how hash-based sharding keeps queries fast helps you explain how large databases stay efficient as they grow.
"What if we changed from hash-based sharding to range-based sharding? How would the time complexity change?"