0
0
Elasticsearchquery~20 mins

TF-IDF and BM25 scoring in Elasticsearch - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
TF-IDF and BM25 Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
What is the BM25 score output for a simple match query?

Given an Elasticsearch index with documents containing a field text, what is the expected BM25 score output for the query {"match": {"text": "quick brown"}} on the document {"text": "the quick brown fox"}?

Assume default BM25 parameters and that the document is the only one in the index.

Elasticsearch
{
  "query": {
    "match": {
      "text": "quick brown"
    }
  }
}
AA positive float score close to 1.0
BZero, because the query terms are stop words
CA negative score, since BM25 can produce negative values
DAn error because BM25 does not support match queries
Attempts:
2 left
💡 Hint

BM25 scores documents based on term frequency and inverse document frequency. The score is always positive.

🧠 Conceptual
intermediate
1:30remaining
Which factor does TF-IDF consider that BM25 improves upon?

TF-IDF and BM25 are both scoring algorithms used in Elasticsearch. Which aspect does BM25 improve compared to classic TF-IDF?

ABM25 uses only exact phrase matches instead of individual terms
BBM25 removes inverse document frequency to simplify scoring
CBM25 ignores term frequency to speed up scoring
DBM25 adds document length normalization to better handle longer documents
Attempts:
2 left
💡 Hint

Think about how document length affects relevance scores.

🔧 Debug
advanced
2:30remaining
Why does this Elasticsearch query return zero scores with BM25?

Consider this Elasticsearch query using BM25 scoring:

{
  "query": {
    "bool": {
      "must": [
        {"match": {"content": "apple orange"}}
      ]
    }
  }
}

The index contains documents with the field content, but all returned documents have a score of 0. What is the most likely cause?

AThe field <code>content</code> is mapped as <code>keyword</code> type, so BM25 scoring does not apply
BBM25 scoring always returns zero for boolean queries
CThe query syntax is invalid and causes scoring to reset to zero
DThe documents do not contain the terms "apple" or "orange"
Attempts:
2 left
💡 Hint

Check the field mapping type for content.

📝 Syntax
advanced
2:00remaining
Which query correctly disables BM25 and uses classic TF-IDF in Elasticsearch?

Elasticsearch uses BM25 by default. Which query snippet correctly disables BM25 and enables classic TF-IDF scoring for the body field?

A{ "match": { "body": { "query": "search text" } }, "similarity": { "type": "classic" } }
B{ "match": { "body": { "query": "search text", "similarity": "classic" } } }
C{ "match": { "body": "search text" }, "similarity": "classic" }
D{ "match": { "body": { "query": "search text" } }, "similarity": "classic" }
Attempts:
2 left
💡 Hint

Similarity settings are usually configured at the field mapping or query level inside the field object.

🚀 Application
expert
3:00remaining
How to tune BM25 parameters to favor term frequency over document length?

You want to adjust BM25 scoring in Elasticsearch to give more importance to how often a term appears in a document, and less importance to the document length. Which parameter settings achieve this?

ASet both <code>b</code> and <code>k1</code> to 0
BSet <code>b</code> close to 1 and <code>k1</code> to a lower value like 0.5
CSet <code>b</code> close to 0 and <code>k1</code> to a higher value like 2.0
DSet <code>b</code> to 0.75 and <code>k1</code> to 1.2 (default values)
Attempts:
2 left
💡 Hint

Remember b controls length normalization and k1 controls term frequency saturation.