$bucket and $bucketAuto for distribution in MongoDB - Time & Space Complexity
When grouping data into ranges or buckets, it is important to know how the time needed grows as data grows.
We want to see how the $bucket and $bucketAuto stages in MongoDB scale with more input documents.
Analyze the time complexity of the following MongoDB aggregation using $bucketAuto.
db.sales.aggregate([
{
$bucketAuto: {
groupBy: "$price",
buckets: 5
}
}
])
This groups sales documents into 5 buckets based on the price field, automatically deciding bucket ranges.
Look for repeated work done on the data.
- Primary operation: Scanning each document once to assign it to a bucket.
- How many times: Exactly once per document during the grouping phase.
As the number of documents grows, the work grows proportionally.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 document checks |
| 100 | About 100 document checks |
| 1000 | About 1000 document checks |
Pattern observation: Doubling the input roughly doubles the work needed.
Time Complexity: O(n log k)
This means the time to bucket documents grows roughly in proportion to the number of documents times the logarithm of the number of buckets.
[X] Wrong: "$bucketAuto runs in constant time no matter how many documents there are."
[OK] Correct: Each document must be checked to decide its bucket, so more documents mean more work.
Understanding how grouping stages scale helps you design queries that stay efficient as data grows.
What if we increased the number of buckets from 5 to 100? How would the time complexity change?