ElasticsearchHow-ToBeginner · 4 min read

How Scoring Works in Elasticsearch: Explained Simply

In Elasticsearch, scoring measures how well each document matches a search query using the BM25 algorithm by default. The score is a number that ranks documents by relevance, considering term frequency, inverse document frequency, and field length. Higher scores mean better matches.

📐

Syntax

The scoring in Elasticsearch happens automatically when you run a match or query_string query inside the query part of a search request. You can also customize scoring using function_score or script_score.

Basic query syntax with scoring:

{
  "query": {
    "match": {
      "field_name": "search terms"
    }
  }
}

Here, Elasticsearch calculates a _score for each document based on how well it matches the query.

json

{
  "query": {
    "match": {
      "field_name": "search terms"
    }
  }
}

💻

Example

This example shows a search query on an index called products where we search for documents with the term coffee in the description field. Elasticsearch returns documents with a _score indicating relevance.

json

{
  "query": {
    "match": {
      "description": "coffee"
    }
  }
}

Output

{ "hits": { "total": 3, "hits": [ { "_id": "1", "_score": 1.2345, "_source": {"description": "Fresh coffee beans"} }, { "_id": "2", "_score": 0.9876, "_source": {"description": "Coffee maker machine"} }, { "_id": "3", "_score": 0.5432, "_source": {"description": "Tea and coffee set"} } ] } }

⚠️

Common Pitfalls

Common mistakes when working with scoring in Elasticsearch include:

Expecting scores to be absolute values rather than relative rankings.
Ignoring that scores depend on the index data and query type.
Not normalizing scores when combining multiple queries or functions.
Using filters instead of queries when you want scoring, since filters do not score.

To fix scoring issues, ensure you use query clauses for scoring and consider using function_score to customize scores.

json

{
  "query": {
    "bool": {
      "filter": [
        { "term": { "category": "books" } }
      ],
      "must": {
        "match": { "title": "elasticsearch" }
      }
    }
  }
}

// This query scores only by the 'match' part; the 'filter' does not affect score.

📊

Quick Reference

Concept	Description
_score	The relevance score assigned to each document by Elasticsearch.
BM25	Default scoring algorithm considering term frequency, inverse document frequency, and field length.
Term Frequency (TF)	How often a term appears in a document; more appearances increase score.
Inverse Document Frequency (IDF)	Rarer terms across documents get higher weight.
Field Length Norm	Shorter fields with the term score higher than longer fields.
Function Score	Allows custom modification of scores using functions or scripts.
Filter Clause	Does not affect scoring; used to limit documents without scoring.
Query Clause	Used to calculate scores based on relevance.

✅

Key Takeaways

Elasticsearch uses the BM25 algorithm to calculate a relevance score called _score for each document.

Scores are relative and help rank documents by how well they match the query terms.

Filters do not affect scoring; use query clauses to influence scores.

You can customize scoring with function_score or script_score queries.

Understanding term frequency, inverse document frequency, and field length is key to interpreting scores.