Recall & Review
beginner
What does TF-IDF stand for and what is its purpose in search engines?
TF-IDF stands for Term Frequency-Inverse Document Frequency. It helps search engines find how important a word is in a document compared to all documents, giving higher scores to words that appear often in one document but rarely in others.
Click to reveal answer
intermediate
Explain the role of BM25 in Elasticsearch scoring.
BM25 is a ranking function used by Elasticsearch to score documents based on how well they match a search query. It improves on TF-IDF by considering term frequency saturation and document length, making search results more relevant.
Click to reveal answer
beginner
How does term frequency (TF) affect document scoring?
Term frequency counts how often a word appears in a document. The more times a word appears, the more important it is for that document, increasing the document's score for queries containing that word.
Click to reveal answer
beginner
What is inverse document frequency (IDF) and why is it important?
Inverse document frequency measures how rare a word is across all documents. Rare words get higher IDF scores, so they have more impact on ranking, helping to highlight unique and meaningful terms.
Click to reveal answer
intermediate
Why does BM25 use document length normalization?
BM25 adjusts scores based on document length to avoid favoring longer documents just because they have more words. This keeps scoring fair by balancing term frequency with document size.
Click to reveal answer
What does the 'IDF' part of TF-IDF measure?
✗ Incorrect
IDF measures how rare or common a term is across all documents, giving higher weight to rare terms.
Which scoring method does Elasticsearch use by default?
✗ Incorrect
Elasticsearch uses BM25 as the default scoring algorithm because it improves relevance by considering term frequency saturation and document length.
Why does BM25 include document length normalization?
✗ Incorrect
BM25 normalizes by document length to prevent longer documents from scoring higher just because they have more words.
In TF-IDF, what happens if a term appears in many documents?
✗ Incorrect
If a term appears in many documents, its IDF score decreases because it is less unique.
Which factor does BM25 consider that basic TF-IDF does not?
✗ Incorrect
BM25 adds document length normalization to improve scoring fairness, which basic TF-IDF does not include.
Describe how TF-IDF helps rank documents in a search engine.
Think about how often a word appears in one document versus many documents.
You got /4 concepts.
Explain why BM25 is considered an improvement over TF-IDF in Elasticsearch.
Consider how BM25 handles long documents and repeated terms.
You got /4 concepts.