0
0
ElasticsearchHow-ToBeginner · 4 min read

How to Use top_hits Aggregation in Elasticsearch

Use the top_hits aggregation in Elasticsearch to get the most relevant or recent documents within each bucket of a parent aggregation. It returns the actual documents instead of just counts or metrics, making it useful for showing sample hits per group.
📐

Syntax

The top_hits aggregation is nested inside a parent aggregation like terms. It has options like size to limit returned hits, sort to order them, and _source to control which fields to return.

  • size: Number of top documents to return per bucket.
  • sort: How to order the documents (e.g., by date descending).
  • _source: Fields to include in the returned documents.
json
{
  "aggs": {
    "group_by_field": {
      "terms": {
        "field": "field_name"
      },
      "aggs": {
        "top_hits_example": {
          "top_hits": {
            "size": 3,
            "sort": [
              {"date": {"order": "desc"}}
            ],
            "_source": ["field1", "field2"]
          }
        }
      }
    }
  }
}
💻

Example

This example groups documents by the user.keyword field and returns the top 2 most recent documents per user, showing only the message and timestamp fields.

json
{
  "size": 0,
  "aggs": {
    "users": {
      "terms": {
        "field": "user.keyword"
      },
      "aggs": {
        "top_user_hits": {
          "top_hits": {
            "size": 2,
            "sort": [
              {"timestamp": {"order": "desc"}}
            ],
            "_source": ["message", "timestamp"]
          }
        }
      }
    }
  }
}
Output
{ "aggregations": { "users": { "buckets": [ { "key": "alice", "doc_count": 5, "top_user_hits": { "hits": { "total": {"value": 5}, "hits": [ {"_source": {"message": "Hello world", "timestamp": "2024-06-01T12:00:00"}}, {"_source": {"message": "Another message", "timestamp": "2024-05-30T09:00:00"}} ] } } }, { "key": "bob", "doc_count": 3, "top_user_hits": { "hits": { "total": {"value": 3}, "hits": [ {"_source": {"message": "Bob's latest", "timestamp": "2024-06-02T15:00:00"}}, {"_source": {"message": "Bob's earlier", "timestamp": "2024-05-28T08:00:00"}} ] } } } ] } } }
⚠️

Common Pitfalls

Common mistakes when using top_hits include:

  • Setting size too high, which can slow down queries.
  • Not using sort, resulting in unpredictable document order.
  • Requesting too many fields in _source, increasing response size unnecessarily.
  • Using top_hits without a parent bucket aggregation, which returns hits for the whole index, not grouped.
json
{
  "aggs": {
    "all_hits": {
      "top_hits": {
        "size": 5
      }
    }
  }
}

// Better to use with a bucket aggregation:
{
  "aggs": {
    "by_category": {
      "terms": { "field": "category.keyword" },
      "aggs": {
        "top_category_hits": {
          "top_hits": { "size": 3, "sort": [{"date": {"order": "desc"}}] }
        }
      }
    }
  }
}
📊

Quick Reference

  • top_hits: Returns actual documents per bucket.
  • size: Limits number of documents returned per bucket.
  • sort: Controls order of returned documents.
  • _source: Selects which fields to include in results.
  • Always use top_hits inside a bucket aggregation like terms.

Key Takeaways

Use top_hits inside a bucket aggregation to get sample documents per group.
Set size and sort to control how many and which documents are returned.
Limit _source fields to reduce response size and improve performance.
Avoid using top_hits alone without a parent aggregation for grouped results.
Keep size small to prevent slow queries and large responses.