Challenge - 5 Problems

🎖️

TF-IDF and BM25 Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

What is the BM25 score output for a simple match query?

Given an Elasticsearch index with documents containing a field text, what is the expected BM25 score output for the query {"match": {"text": "quick brown"}} on the document {"text": "the quick brown fox"}?

Assume default BM25 parameters and that the document is the only one in the index.

Elasticsearch

{
  "query": {
    "match": {
      "text": "quick brown"
    }
  }
}

AA positive float score close to 1.0

BZero, because the query terms are stop words

CA negative score, since BM25 can produce negative values

DAn error because BM25 does not support match queries

Attempts:

2 left

🧠 Conceptual

intermediate

1:30remaining

Which factor does TF-IDF consider that BM25 improves upon?

TF-IDF and BM25 are both scoring algorithms used in Elasticsearch. Which aspect does BM25 improve compared to classic TF-IDF?

ABM25 uses only exact phrase matches instead of individual terms

BBM25 removes inverse document frequency to simplify scoring

CBM25 ignores term frequency to speed up scoring

DBM25 adds document length normalization to better handle longer documents

Attempts:

2 left

🔧 Debug

advanced

2:30remaining

Why does this Elasticsearch query return zero scores with BM25?

Consider this Elasticsearch query using BM25 scoring:

{
  "query": {
    "bool": {
      "must": [
        {"match": {"content": "apple orange"}}
      ]
    }
  }
}

The index contains documents with the field content, but all returned documents have a score of 0. What is the most likely cause?

AThe field <code>content</code> is mapped as <code>keyword</code> type, so BM25 scoring does not apply

BBM25 scoring always returns zero for boolean queries

CThe query syntax is invalid and causes scoring to reset to zero

DThe documents do not contain the terms "apple" or "orange"

Attempts:

2 left

📝 Syntax

advanced

2:00remaining

Which query correctly disables BM25 and uses classic TF-IDF in Elasticsearch?

Elasticsearch uses BM25 by default. Which query snippet correctly disables BM25 and enables classic TF-IDF scoring for the body field?

A{ "match": { "body": { "query": "search text" } }, "similarity": { "type": "classic" } }

B{ "match": { "body": { "query": "search text", "similarity": "classic" } } }

C{ "match": { "body": "search text" }, "similarity": "classic" }

D{ "match": { "body": { "query": "search text" } }, "similarity": "classic" }

Attempts:

2 left

🚀 Application

expert

3:00remaining

How to tune BM25 parameters to favor term frequency over document length?

You want to adjust BM25 scoring in Elasticsearch to give more importance to how often a term appears in a document, and less importance to the document length. Which parameter settings achieve this?

ASet both <code>b</code> and <code>k1</code> to 0

BSet <code>b</code> close to 1 and <code>k1</code> to a lower value like 0.5

CSet <code>b</code> close to 0 and <code>k1</code> to a higher value like 2.0

DSet <code>b</code> to 0.75 and <code>k1</code> to 1.2 (default values)

Attempts:

2 left