ElasticsearchHow-ToBeginner · 3 min read

How to Use Cardinality Aggregation in Elasticsearch

Use the cardinality aggregation in Elasticsearch to count the number of unique values in a field. It provides an approximate count of distinct values using HyperLogLog++ algorithm, which is efficient for large datasets. You specify the field inside the cardinality aggregation in your query's aggs section.

📐

Syntax

The cardinality aggregation syntax includes specifying the aggregation name, the type cardinality, and the field to count unique values from. Optionally, you can set precision_threshold to control accuracy and memory usage.

aggregation_name: Your chosen name for the aggregation.
field: The field to count unique values on.
precision_threshold (optional): Number between 0 and 40000 to increase accuracy at the cost of memory.

json

{
  "aggs": {
    "unique_count": {
      "cardinality": {
        "field": "your_field",
        "precision_threshold": 1000
      }
    }
  }
}

💻

Example

This example counts the unique user IDs in an index called logs. It shows how to write the query and the expected output format.

json

{
  "size": 0,
  "aggs": {
    "unique_users": {
      "cardinality": {
        "field": "user_id.keyword"
      }
    }
  }
}

Output

{ "aggregations": { "unique_users": { "value": 1234 } } }

⚠️

Common Pitfalls

Common mistakes include:

Using a text field without keyword subfield, which causes errors or wrong counts.
Not setting precision_threshold when high accuracy is needed, leading to approximate results.
Expecting exact counts on very large datasets; cardinality is approximate by design.

Always use keyword or numeric fields for cardinality aggregation.

json

{
  "aggs": {
    "wrong_usage": {
      "cardinality": {
        "field": "user_id"  
      }
    }
  }
}

// Correct usage:
{
  "aggs": {
    "correct_usage": {
      "cardinality": {
        "field": "user_id.keyword"
      }
    }
  }
}

📊

Quick Reference

Parameter	Description	Default
field	The field to count unique values on	none (required)
precision_threshold	Controls accuracy and memory (0-40000)	3000
missing	Value to use if field is missing	none
rehash	Whether to rehash values before counting	true

✅

Key Takeaways

Use cardinality aggregation to get an approximate count of unique values efficiently.

Always use keyword or numeric fields, not analyzed text fields, for cardinality aggregation.

Adjust precision_threshold to balance accuracy and memory usage.

Cardinality aggregation results are approximate, not exact, especially on large datasets.

Include cardinality aggregation inside the aggs section of your Elasticsearch query.