Bucket aggregations (terms, histogram) in Elasticsearch - Time & Space Complexity
When using bucket aggregations like terms or histogram in Elasticsearch, it's important to know how the work grows as data grows.
We want to understand how the number of operations changes when we group data into buckets.
Analyze the time complexity of the following code snippet.
GET /sales/_search
{
"size": 0,
"aggs": {
"sales_per_category": {
"terms": { "field": "category.keyword", "size": 10 }
}
}
}
This code groups sales documents by category, returning the top 10 categories with their counts.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Scanning each document once to assign it to a bucket.
- How many times: Once per document in the index.
As the number of documents grows, Elasticsearch processes each document to place it in the right bucket.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 document checks |
| 100 | About 100 document checks |
| 1000 | About 1000 document checks |
Pattern observation: The work grows directly with the number of documents.
Time Complexity: O(n)
This means the time to complete the aggregation grows linearly with the number of documents.
[X] Wrong: "Bucket aggregations only look at the top N buckets, so time stays the same no matter how many documents."
[OK] Correct: Even if we limit buckets, Elasticsearch must scan all documents to count and assign them before picking the top buckets.
Understanding how bucket aggregations scale helps you explain how search engines handle grouping and counting large data sets efficiently.
What if we changed the aggregation to a histogram on a numeric field with many buckets? How would the time complexity change?