TF-IDF and BM25 help find the most important words in documents. They score how well documents match your search words.
TF-IDF and BM25 scoring in Elasticsearch
GET /your_index/_search
{
"query": {
"match": {
"field_name": {
"query": "search words",
"operator": "and"
}
}
},
"explain": true
}Elasticsearch uses BM25 as the default scoring method since version 5.0.
You can enable explanation to see how TF-IDF or BM25 scores are calculated.
GET /my_index/_search
{
"query": {
"match": {
"content": "quick brown fox"
}
}
}GET /my_index/_search
{
"query": {
"match": {
"content": {
"query": "quick brown fox",
"operator": "or"
}
}
},
"explain": true
}PUT /my_index
{
"settings": {
"similarity": {
"my_tfidf": {
"type": "classic"
}
}
},
"mappings": {
"properties": {
"content": {
"type": "text"
}
}
}
}This example creates an index using TF-IDF scoring for the 'title' field. It adds two documents and searches for 'quick fox' in titles. Explanation shows how TF-IDF scores the documents.
PUT /books
{
"settings": {
"similarity": {
"my_tfidf": {
"type": "classic"
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"similarity": "my_tfidf"
},
"description": {
"type": "text"
}
}
}
}
POST /books/_doc/1
{
"title": "The quick brown fox",
"description": "A story about a quick fox."
}
POST /books/_doc/2
{
"title": "Lazy dog sleeps",
"description": "A story about a lazy dog."
}
GET /books/_search
{
"query": {
"match": {
"title": {
"query": "quick fox",
"operator": "and"
}
}
},
"explain": true
}BM25 is better for most modern search needs because it balances term frequency and document length.
TF-IDF (classic similarity) is older but useful for understanding basic scoring concepts.
Use the 'explain' option in your search to see how scores are calculated step-by-step.
TF-IDF and BM25 score how important words are in documents for search.
Elasticsearch uses BM25 by default but you can switch to TF-IDF if needed.
Use scoring to get better search results that match user queries well.