How to Use top_hits Aggregation in Elasticsearch
Use the
top_hits aggregation in Elasticsearch to get the most relevant or recent documents within each bucket of a parent aggregation. It returns the actual documents instead of just counts or metrics, making it useful for showing sample hits per group.Syntax
The top_hits aggregation is nested inside a parent aggregation like terms. It has options like size to limit returned hits, sort to order them, and _source to control which fields to return.
- size: Number of top documents to return per bucket.
- sort: How to order the documents (e.g., by date descending).
- _source: Fields to include in the returned documents.
json
{
"aggs": {
"group_by_field": {
"terms": {
"field": "field_name"
},
"aggs": {
"top_hits_example": {
"top_hits": {
"size": 3,
"sort": [
{"date": {"order": "desc"}}
],
"_source": ["field1", "field2"]
}
}
}
}
}
}Example
This example groups documents by the user.keyword field and returns the top 2 most recent documents per user, showing only the message and timestamp fields.
json
{
"size": 0,
"aggs": {
"users": {
"terms": {
"field": "user.keyword"
},
"aggs": {
"top_user_hits": {
"top_hits": {
"size": 2,
"sort": [
{"timestamp": {"order": "desc"}}
],
"_source": ["message", "timestamp"]
}
}
}
}
}
}Output
{
"aggregations": {
"users": {
"buckets": [
{
"key": "alice",
"doc_count": 5,
"top_user_hits": {
"hits": {
"total": {"value": 5},
"hits": [
{"_source": {"message": "Hello world", "timestamp": "2024-06-01T12:00:00"}},
{"_source": {"message": "Another message", "timestamp": "2024-05-30T09:00:00"}}
]
}
}
},
{
"key": "bob",
"doc_count": 3,
"top_user_hits": {
"hits": {
"total": {"value": 3},
"hits": [
{"_source": {"message": "Bob's latest", "timestamp": "2024-06-02T15:00:00"}},
{"_source": {"message": "Bob's earlier", "timestamp": "2024-05-28T08:00:00"}}
]
}
}
}
]
}
}
}
Common Pitfalls
Common mistakes when using top_hits include:
- Setting
sizetoo high, which can slow down queries. - Not using
sort, resulting in unpredictable document order. - Requesting too many fields in
_source, increasing response size unnecessarily. - Using
top_hitswithout a parent bucket aggregation, which returns hits for the whole index, not grouped.
json
{
"aggs": {
"all_hits": {
"top_hits": {
"size": 5
}
}
}
}
// Better to use with a bucket aggregation:
{
"aggs": {
"by_category": {
"terms": { "field": "category.keyword" },
"aggs": {
"top_category_hits": {
"top_hits": { "size": 3, "sort": [{"date": {"order": "desc"}}] }
}
}
}
}
}Quick Reference
- top_hits: Returns actual documents per bucket.
- size: Limits number of documents returned per bucket.
- sort: Controls order of returned documents.
- _source: Selects which fields to include in results.
- Always use
top_hitsinside a bucket aggregation liketerms.
Key Takeaways
Use top_hits inside a bucket aggregation to get sample documents per group.
Set size and sort to control how many and which documents are returned.
Limit _source fields to reduce response size and improve performance.
Avoid using top_hits alone without a parent aggregation for grouped results.
Keep size small to prevent slow queries and large responses.