0
0
ElasticsearchHow-ToBeginner · 3 min read

How to Use Cardinality Aggregation in Elasticsearch

Use the cardinality aggregation in Elasticsearch to count the number of unique values in a field. It provides an approximate count of distinct values using HyperLogLog++ algorithm, which is efficient for large datasets. You specify the field inside the cardinality aggregation in your query's aggs section.
📐

Syntax

The cardinality aggregation syntax includes specifying the aggregation name, the type cardinality, and the field to count unique values from. Optionally, you can set precision_threshold to control accuracy and memory usage.

  • aggregation_name: Your chosen name for the aggregation.
  • field: The field to count unique values on.
  • precision_threshold (optional): Number between 0 and 40000 to increase accuracy at the cost of memory.
json
{
  "aggs": {
    "unique_count": {
      "cardinality": {
        "field": "your_field",
        "precision_threshold": 1000
      }
    }
  }
}
💻

Example

This example counts the unique user IDs in an index called logs. It shows how to write the query and the expected output format.

json
{
  "size": 0,
  "aggs": {
    "unique_users": {
      "cardinality": {
        "field": "user_id.keyword"
      }
    }
  }
}
Output
{ "aggregations": { "unique_users": { "value": 1234 } } }
⚠️

Common Pitfalls

Common mistakes include:

  • Using a text field without keyword subfield, which causes errors or wrong counts.
  • Not setting precision_threshold when high accuracy is needed, leading to approximate results.
  • Expecting exact counts on very large datasets; cardinality is approximate by design.

Always use keyword or numeric fields for cardinality aggregation.

json
{
  "aggs": {
    "wrong_usage": {
      "cardinality": {
        "field": "user_id"  
      }
    }
  }
}

// Correct usage:
{
  "aggs": {
    "correct_usage": {
      "cardinality": {
        "field": "user_id.keyword"
      }
    }
  }
}
📊

Quick Reference

ParameterDescriptionDefault
fieldThe field to count unique values onnone (required)
precision_thresholdControls accuracy and memory (0-40000)3000
missingValue to use if field is missingnone
rehashWhether to rehash values before countingtrue

Key Takeaways

Use cardinality aggregation to get an approximate count of unique values efficiently.
Always use keyword or numeric fields, not analyzed text fields, for cardinality aggregation.
Adjust precision_threshold to balance accuracy and memory usage.
Cardinality aggregation results are approximate, not exact, especially on large datasets.
Include cardinality aggregation inside the aggs section of your Elasticsearch query.