Range buckets in Elasticsearch - Time & Space Complexity
When using range buckets in Elasticsearch, we group data into ranges to analyze it better.
We want to know how the time to create these buckets changes as we add more data or ranges.
Analyze the time complexity of the following Elasticsearch aggregation using range buckets.
{
"aggs": {
"price_ranges": {
"range": {
"field": "price",
"ranges": [
{ "to": 50 },
{ "from": 50, "to": 100 },
{ "from": 100 }
]
}
}
}
}
This code groups documents by their price into three ranges: below 50, 50 to 100, and above 100.
Look at what repeats when Elasticsearch processes this aggregation.
- Primary operation: Checking each document's price to see which range it fits.
- How many times: Once for every document in the data set.
As you add more documents, Elasticsearch checks each one against the ranges.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 30 checks |
| 100 | About 300 checks |
| 1000 | About 3000 checks |
Pattern observation: The number of checks grows directly with the number of documents times the number of ranges.
Time Complexity: O(n * r)
This means the time to create range buckets grows linearly with the number of documents and the number of ranges.
[X] Wrong: "Adding more ranges will multiply the time by the number of ranges squared."
[OK] Correct: Each document is checked against each range, so time grows with documents times ranges, but ranges are usually few and fixed, so the main growth depends on documents.
Understanding how range buckets scale helps you explain how Elasticsearch handles grouping large data sets efficiently.
What if we increased the number of ranges significantly? How would the time complexity change?