Given an Elasticsearch index with documents containing a field text, what is the expected BM25 score output for the query {"match": {"text": "quick brown"}} on the document {"text": "the quick brown fox"}?
Assume default BM25 parameters and that the document is the only one in the index.
{
"query": {
"match": {
"text": "quick brown"
}
}
}BM25 scores documents based on term frequency and inverse document frequency. The score is always positive.
BM25 scoring produces a positive relevance score based on how well the document matches the query terms. Since the document contains both "quick" and "brown", the score will be positive and typically close to 1.0 for a single document.
TF-IDF and BM25 are both scoring algorithms used in Elasticsearch. Which aspect does BM25 improve compared to classic TF-IDF?
Think about how document length affects relevance scores.
BM25 improves upon TF-IDF by adding document length normalization, which prevents longer documents from unfairly scoring higher just because they have more terms.
Consider this Elasticsearch query using BM25 scoring:
{
"query": {
"bool": {
"must": [
{"match": {"content": "apple orange"}}
]
}
}
}The index contains documents with the field content, but all returned documents have a score of 0. What is the most likely cause?
Check the field mapping type for content.
BM25 scoring applies only to text fields analyzed with full-text analyzers. If content is mapped as keyword, it is not analyzed and BM25 scoring does not work, resulting in zero scores.
Elasticsearch uses BM25 by default. Which query snippet correctly disables BM25 and enables classic TF-IDF scoring for the body field?
Similarity settings are usually configured at the field mapping or query level inside the field object.
Option B correctly sets the similarity parameter inside the match query for the body field, enabling classic TF-IDF scoring. Other options either place similarity outside the field or use invalid syntax.
You want to adjust BM25 scoring in Elasticsearch to give more importance to how often a term appears in a document, and less importance to the document length. Which parameter settings achieve this?
Remember b controls length normalization and k1 controls term frequency saturation.
Setting b close to 0 disables length normalization, so document length has less effect. Increasing k1 increases the impact of term frequency, making repeated terms more influential.