Stats and extended stats in Elasticsearch - Time & Space Complexity
When using stats and extended stats in Elasticsearch, it's important to know how the time to get results grows as data grows.
We want to understand how the number of documents affects the time Elasticsearch takes to calculate these statistics.
Analyze the time complexity of the following Elasticsearch aggregation query.
{
"size": 0,
"aggs": {
"stats_example": {
"stats": { "field": "price" }
},
"extended_stats_example": {
"extended_stats": { "field": "price" }
}
}
}
This query calculates basic stats and extended stats on the "price" field for all matching documents.
Look at what repeats as Elasticsearch processes the data.
- Primary operation: Scanning each document's "price" field to update stats.
- How many times: Once for every document that matches the query.
As the number of documents increases, the work to calculate stats grows in a straight line.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 operations scanning "price" values |
| 100 | 100 operations scanning "price" values |
| 1000 | 1000 operations scanning "price" values |
Pattern observation: The number of operations grows directly with the number of documents.
Time Complexity: O(n)
This means the time to compute stats grows in direct proportion to the number of documents.
[X] Wrong: "Stats calculations are instant no matter how many documents there are."
[OK] Correct: Elasticsearch must look at each document's field value to calculate stats, so more documents mean more work and more time.
Understanding how stats aggregations scale helps you explain performance in real projects and shows you can think about data size impact clearly.
"What if we added a filter to reduce the documents before stats calculation? How would the time complexity change?"