Why aggregations summarize data in Elasticsearch - Performance Analysis
When using aggregations in Elasticsearch, it is important to understand how the time to summarize data changes as the data grows.
We want to know how the work done by aggregations increases when there are more documents to process.
Analyze the time complexity of the following aggregation query.
GET /sales/_search
{
"size": 0,
"aggs": {
"total_revenue": {
"sum": { "field": "price" }
}
}
}
This query calculates the total sum of the "price" field across all sales documents.
Look at what repeats as the data size grows.
- Primary operation: Summing the "price" field for each document.
- How many times: Once for every document in the index.
As the number of documents increases, the aggregation must add more values.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 sums |
| 100 | 100 sums |
| 1000 | 1000 sums |
Pattern observation: The work grows directly with the number of documents.
Time Complexity: O(n)
This means the time to compute the sum grows in a straight line as the number of documents increases.
[X] Wrong: "Aggregations always run instantly no matter how much data there is."
[OK] Correct: Aggregations must look at each document to summarize data, so more documents mean more work and more time.
Understanding how aggregations scale helps you explain how search engines handle large data efficiently and why performance matters.
What if we changed the aggregation from a sum to a terms aggregation grouping by a field? How would the time complexity change?