Cardinality aggregation in Elasticsearch - Time & Space Complexity
When using cardinality aggregation in Elasticsearch, we want to know how the time to count unique values changes as data grows.
We ask: How does the work increase when there are more documents or more unique values?
Analyze the time complexity of the following code snippet.
GET /my_index/_search
{
"size": 0,
"aggs": {
"unique_users": {
"cardinality": {
"field": "user_id"
}
}
}
}
This query counts the number of unique user IDs in the index using cardinality aggregation.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Scanning each document's user_id field once.
- How many times: Once per document in the index.
As the number of documents grows, the aggregation must look at more user_id values.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 checks of user_id |
| 100 | 100 checks of user_id |
| 1000 | 1000 checks of user_id |
Pattern observation: The work grows directly with the number of documents scanned.
Time Complexity: O(n)
This means the time to find unique values grows in a straight line as the number of documents increases.
[X] Wrong: "Cardinality aggregation is instant no matter how many documents there are."
[OK] Correct: The aggregation must check each document's field once, so more documents mean more work.
Understanding how cardinality aggregation scales helps you explain performance in real search systems and shows you can reason about data size effects.
"What if we added a precision threshold to the cardinality aggregation? How would that affect the time complexity?"