0
0
Elasticsearchquery~30 mins

TF-IDF and BM25 scoring in Elasticsearch - Mini Project: Build & Apply

Choose your learning style9 modes available
TF-IDF and BM25 Scoring with Elasticsearch
📖 Scenario: You are building a simple search engine for a small online bookstore. You want to understand how Elasticsearch scores documents using TF-IDF and BM25 algorithms.We will create an index with a few book descriptions, configure the scoring algorithm, and run queries to see how the scores differ.
🎯 Goal: Create an Elasticsearch index with book data, configure the similarity scoring to use TF-IDF and BM25, and run queries to compare the scoring results.
📋 What You'll Learn
Create an Elasticsearch index named books with a description field.
Configure the description field to use the classic similarity (TF-IDF) in one step.
Configure the description field to use the BM25 similarity in another step.
Index three book documents with exact titles and descriptions.
Run a search query on the description field for the term adventure.
Compare the scores returned by TF-IDF and BM25 configurations.
💡 Why This Matters
🌍 Real World
Search engines use scoring algorithms like TF-IDF and BM25 to rank documents by relevance. Understanding these helps improve search quality in applications like online bookstores, news sites, and more.
💼 Career
Many jobs in data engineering, search engine development, and backend development require knowledge of Elasticsearch and how to tune search relevance using scoring algorithms.
Progress0 / 4 steps
1
Create the books index with three book documents
Create an Elasticsearch index called books and index these three documents with fields title and description: {"title": "The Lost Island", "description": "An exciting adventure on a mysterious island."}, {"title": "Space Journey", "description": "A thrilling adventure through the stars."}, and {"title": "Cooking 101", "description": "Basic cooking techniques for beginners."}.
Elasticsearch
Need a hint?

Use PUT to create the index with mappings, then POST to add documents.

2
Configure the description field to use TF-IDF (classic similarity)
Update the books index mapping to set the description field's similarity to classic (TF-IDF). Use the PUT /books/_mapping API to add "similarity": "classic" to the description field.
Elasticsearch
Need a hint?

Set "similarity": "classic" inside the description field mapping.

3
Configure the description field to use BM25 similarity
Update the books index mapping to set the description field's similarity to BM25. Use the PUT /books/_mapping API to add "similarity": "BM25" to the description field.
Elasticsearch
Need a hint?

Set "similarity": "BM25" inside the description field mapping.

4
Run a search query for adventure on the description field
Write a search query using GET /books/_search that searches the description field for the term adventure. Use a match query on description with the value adventure.
Elasticsearch
Need a hint?

Use a match query on the description field with the term adventure.