How Scoring Works in Elasticsearch: Explained Simply
In Elasticsearch,
scoring measures how well each document matches a search query using the BM25 algorithm by default. The score is a number that ranks documents by relevance, considering term frequency, inverse document frequency, and field length. Higher scores mean better matches.Syntax
The scoring in Elasticsearch happens automatically when you run a match or query_string query inside the query part of a search request. You can also customize scoring using function_score or script_score.
Basic query syntax with scoring:
{
"query": {
"match": {
"field_name": "search terms"
}
}
}Here, Elasticsearch calculates a _score for each document based on how well it matches the query.
json
{
"query": {
"match": {
"field_name": "search terms"
}
}
}Example
This example shows a search query on an index called products where we search for documents with the term coffee in the description field. Elasticsearch returns documents with a _score indicating relevance.
json
{
"query": {
"match": {
"description": "coffee"
}
}
}Output
{
"hits": {
"total": 3,
"hits": [
{
"_id": "1",
"_score": 1.2345,
"_source": {"description": "Fresh coffee beans"}
},
{
"_id": "2",
"_score": 0.9876,
"_source": {"description": "Coffee maker machine"}
},
{
"_id": "3",
"_score": 0.5432,
"_source": {"description": "Tea and coffee set"}
}
]
}
}
Common Pitfalls
Common mistakes when working with scoring in Elasticsearch include:
- Expecting scores to be absolute values rather than relative rankings.
- Ignoring that scores depend on the index data and query type.
- Not normalizing scores when combining multiple queries or functions.
- Using filters instead of queries when you want scoring, since filters do not score.
To fix scoring issues, ensure you use query clauses for scoring and consider using function_score to customize scores.
json
{
"query": {
"bool": {
"filter": [
{ "term": { "category": "books" } }
],
"must": {
"match": { "title": "elasticsearch" }
}
}
}
}
// This query scores only by the 'match' part; the 'filter' does not affect score.Quick Reference
| Concept | Description |
|---|---|
| _score | The relevance score assigned to each document by Elasticsearch. |
| BM25 | Default scoring algorithm considering term frequency, inverse document frequency, and field length. |
| Term Frequency (TF) | How often a term appears in a document; more appearances increase score. |
| Inverse Document Frequency (IDF) | Rarer terms across documents get higher weight. |
| Field Length Norm | Shorter fields with the term score higher than longer fields. |
| Function Score | Allows custom modification of scores using functions or scripts. |
| Filter Clause | Does not affect scoring; used to limit documents without scoring. |
| Query Clause | Used to calculate scores based on relevance. |
Key Takeaways
Elasticsearch uses the BM25 algorithm to calculate a relevance score called _score for each document.
Scores are relative and help rank documents by how well they match the query terms.
Filters do not affect scoring; use query clauses to influence scores.
You can customize scoring with function_score or script_score queries.
Understanding term frequency, inverse document frequency, and field length is key to interpreting scores.