Overview - TF-IDF and BM25 scoring

What is it?

TF-IDF and BM25 are methods used to rank documents by how relevant they are to a search query. TF-IDF stands for Term Frequency-Inverse Document Frequency, which measures how important a word is in a document compared to all documents. BM25 is a more advanced scoring method that improves on TF-IDF by considering document length and term saturation. Both help search engines like Elasticsearch find the best matches for your search words.

Why it matters

Without these scoring methods, search engines would treat all words and documents equally, making search results less useful and less relevant. Imagine searching for a recipe and getting random pages instead of the best matches. TF-IDF and BM25 solve this by ranking documents so the most relevant ones appear first, saving time and improving user experience.

Where it fits

Before learning TF-IDF and BM25, you should understand basic search concepts like keywords and documents. After this, you can explore how Elasticsearch uses these scores in queries and how to tune search relevance for better results.

Mental Model

Core Idea

TF-IDF and BM25 score documents by balancing how often words appear in a document against how rare those words are across all documents, adjusting for document length and word repetition.

Think of it like...

Imagine you are looking for a book in a library. TF-IDF is like noticing how many times a word appears in a book (term frequency) and how unique that word is in the whole library (inverse document frequency). BM25 adds the idea that longer books might naturally have more words, so it adjusts your attention to avoid favoring just longer books.

┌───────────────┐       ┌─────────────────────────────┐
│ Search Query  │──────▶│ Calculate Term Frequency (TF)│
└───────────────┘       └─────────────┬───────────────┘
                                   │
                                   ▼
                      ┌─────────────────────────────┐
                      │ Calculate Inverse Document  │
                      │ Frequency (IDF)             │
                      └─────────────┬───────────────┘
                                   │
                                   ▼
                      ┌─────────────────────────────┐
                      │ Combine TF and IDF to score  │
                      │ documents (TF-IDF)           │
                      └─────────────┬───────────────┘
                                   │
                                   ▼
                      ┌─────────────────────────────┐
                      │ Adjust scores for document   │
                      │ length and term saturation   │
                      │ (BM25)                      │
                      └─────────────┬───────────────┘
                                   │
                                   ▼
                      ┌─────────────────────────────┐
                      │ Rank documents by score      │
                      └─────────────────────────────┘

Build-Up - 7 Steps

1

FoundationUnderstanding Term Frequency (TF)

Concept: Term Frequency counts how often a word appears in a single document.

Term Frequency (TF) measures the number of times a word appears in a document. For example, if the word 'apple' appears 3 times in a document of 100 words, the TF is 3. This shows how important the word is in that document.

Result

You get a number showing how common a word is in one document.

Understanding TF helps you see why words repeated more in a document might be more relevant to that document's topic.

2

FoundationGrasping Inverse Document Frequency (IDF)

3

IntermediateCombining TF and IDF for Scoring

4

IntermediateLimitations of TF-IDF in Search

5

IntermediateIntroducing BM25 Scoring Method

6

AdvancedBM25 Formula and Parameters Explained

7

ExpertHow Elasticsearch Implements BM25 by Default

Under the Hood

TF-IDF works by counting word occurrences in documents and across the whole collection, then multiplying these counts to score relevance. BM25 builds on this by adding a formula that normalizes scores based on document length and limits how much repeated words increase the score. Elasticsearch stores term statistics and document lengths in its inverted index, allowing fast calculation of these scores at query time.

Why designed this way?

TF-IDF was designed to highlight words that are important in a document but rare overall, solving the problem of common words diluting search relevance. BM25 was created to fix TF-IDF's bias toward longer documents and repeated terms, making search results fairer and more accurate. Elasticsearch adopted BM25 as default because it balances effectiveness and efficiency for large-scale search.

┌───────────────┐       ┌───────────────┐       ┌───────────────┐
│ Document Text │──────▶│ Inverted Index│──────▶│ Term Statistics│
└───────────────┘       └───────────────┘       └───────────────┘
                                                      │
                                                      ▼
                                         ┌─────────────────────────┐
                                         │ Calculate TF and IDF     │
                                         └─────────────┬───────────┘
                                                       │
                                                       ▼
                                         ┌─────────────────────────┐
                                         │ Apply BM25 formula with  │
                                         │ length normalization     │
                                         └─────────────┬───────────┘
                                                       │
                                                       ▼
                                         ┌─────────────────────────┐
                                         │ Score and rank documents │
                                         └─────────────────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Does TF-IDF always give better search results than simple keyword matching? Commit yes or no.

Common Belief:TF-IDF always improves search results compared to just matching keywords.

Tap to reveal reality

Quick: Is BM25 just a complicated version of TF-IDF with no real benefit? Commit yes or no.

Common Belief:BM25 is just a complex TF-IDF variant without practical advantages.

Tap to reveal reality

Quick: Does increasing term frequency always increase BM25 score linearly? Commit yes or no.

Common Belief:More occurrences of a word always increase BM25 score proportionally.

Tap to reveal reality

Quick: Can you change BM25 parameters globally and per field in Elasticsearch? Commit yes or no.

Common Belief:BM25 parameters can only be set globally for the whole index.

Tap to reveal reality

Expert Zone

1

BM25's length normalization parameter b can be tuned to handle collections with very short or very long documents differently, which many overlook.

2

The saturation effect controlled by k1 prevents term frequency from dominating scores, but setting k1 too low can underweight important repeated terms.

3

Elasticsearch combines BM25 with other scoring factors like field norms and coordination factors, which subtly influence final rankings beyond the BM25 formula.

When NOT to use

BM25 and TF-IDF are less effective for semantic or contextual search where word meaning matters more than frequency. In such cases, use vector search or neural embeddings. Also, for very short texts or structured data, simpler scoring or exact matching may be better.

Production Patterns

In production, Elasticsearch users often tune BM25 parameters per field to balance recall and precision. They combine BM25 with filters and boosting to prioritize certain documents. Monitoring search logs helps adjust parameters over time for evolving data.

Connections

Information Retrieval

TF-IDF and BM25 are foundational algorithms in the field of information retrieval.

Understanding these scoring methods deepens knowledge of how search engines retrieve and rank information efficiently.

Probability Theory

BM25 scoring is based on probabilistic models estimating the likelihood a document is relevant given a query.

Knowing probability concepts helps grasp why BM25 uses saturation and normalization to model relevance realistically.

Human Attention and Memory

TF-IDF and BM25 mimic how humans focus on rare but important words when searching for information.

Recognizing this connection explains why these algorithms feel intuitive and effective in ranking search results.

Common Pitfalls

#1Ignoring document length causes biased rankings.

Wrong approach:Using raw TF-IDF scores without length normalization: SELECT * FROM documents ORDER BY tf_idf_score DESC;

Correct approach:Use BM25 scoring in Elasticsearch which normalizes for document length: GET /_search { "query": { "match": { "content": { "query": "search terms", "operator": "and" } } } }

Root cause:Not accounting for document length makes longer documents unfairly score higher.

#2Setting BM25 parameters without understanding their effect.

Wrong approach:Setting k1 to 0 or b to 0 without testing: PUT /my_index { "settings": { "similarity": { "default": { "type": "BM25", "k1": 0, "b": 0 } } } }

Correct approach:Use recommended defaults or tune parameters gradually: PUT /my_index { "settings": { "similarity": { "default": { "type": "BM25", "k1": 1.2, "b": 0.75 } } } }

Root cause:Misunderstanding parameters leads to poor scoring and irrelevant results.

#3Assuming TF-IDF and BM25 are interchangeable without impact.

Wrong approach:Switching from BM25 to TF-IDF in Elasticsearch without re-tuning: PUT /my_index { "settings": { "similarity": { "default": { "type": "classic" } } } }

Correct approach:Evaluate and tune search relevance after changing similarity algorithms.

Root cause:Different algorithms behave differently; ignoring this causes unexpected search quality drops.

Key Takeaways

TF-IDF scores documents by balancing how often words appear in a document against how rare they are across all documents.

BM25 improves on TF-IDF by adjusting for document length and limiting the impact of repeated words to produce fairer rankings.

Elasticsearch uses BM25 as the default scoring method and allows tuning its parameters per field for better search relevance.

Understanding the math and parameters behind BM25 helps you customize search engines to your data and user needs.

Misusing or misunderstanding these scoring methods can lead to poor search results, so careful tuning and evaluation are essential.