How to Use Cardinality Aggregation in Elasticsearch
Use the
cardinality aggregation in Elasticsearch to count the number of unique values in a field. It provides an approximate count of distinct values using HyperLogLog++ algorithm, which is efficient for large datasets. You specify the field inside the cardinality aggregation in your query's aggs section.Syntax
The cardinality aggregation syntax includes specifying the aggregation name, the type cardinality, and the field to count unique values from. Optionally, you can set precision_threshold to control accuracy and memory usage.
- aggregation_name: Your chosen name for the aggregation.
- field: The field to count unique values on.
- precision_threshold (optional): Number between 0 and 40000 to increase accuracy at the cost of memory.
json
{
"aggs": {
"unique_count": {
"cardinality": {
"field": "your_field",
"precision_threshold": 1000
}
}
}
}Example
This example counts the unique user IDs in an index called logs. It shows how to write the query and the expected output format.
json
{
"size": 0,
"aggs": {
"unique_users": {
"cardinality": {
"field": "user_id.keyword"
}
}
}
}Output
{
"aggregations": {
"unique_users": {
"value": 1234
}
}
}
Common Pitfalls
Common mistakes include:
- Using a text field without keyword subfield, which causes errors or wrong counts.
- Not setting
precision_thresholdwhen high accuracy is needed, leading to approximate results. - Expecting exact counts on very large datasets; cardinality is approximate by design.
Always use keyword or numeric fields for cardinality aggregation.
json
{
"aggs": {
"wrong_usage": {
"cardinality": {
"field": "user_id"
}
}
}
}
// Correct usage:
{
"aggs": {
"correct_usage": {
"cardinality": {
"field": "user_id.keyword"
}
}
}
}Quick Reference
| Parameter | Description | Default |
|---|---|---|
| field | The field to count unique values on | none (required) |
| precision_threshold | Controls accuracy and memory (0-40000) | 3000 |
| missing | Value to use if field is missing | none |
| rehash | Whether to rehash values before counting | true |
Key Takeaways
Use cardinality aggregation to get an approximate count of unique values efficiently.
Always use keyword or numeric fields, not analyzed text fields, for cardinality aggregation.
Adjust precision_threshold to balance accuracy and memory usage.
Cardinality aggregation results are approximate, not exact, especially on large datasets.
Include cardinality aggregation inside the aggs section of your Elasticsearch query.