Elasticsearchquery~30 mins

Token filters (lowercase, stemmer, synonym) in Elasticsearch - Mini Project: Build & Apply

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Create a Custom Elasticsearch Analyzer with Token Filters

📖 Scenario: You are setting up a search engine for a book store. You want to make sure that searches find books even if users type words in different forms or cases. For example, searching for "Running" should find books with "running" or "run". Also, some words have synonyms, like "quick" and "fast".

🎯 Goal: Build an Elasticsearch index with a custom analyzer that uses token filters: lowercase, stemmer, and synonym filters. This will help the search engine understand different word forms and synonyms.

📋 What You'll Learn

Create an index called books.

Define a custom analyzer named custom_analyzer.

Use the standard tokenizer in the analyzer.

Add three token filters to the analyzer: lowercase, english_stemmer, and synonym_filter.

Define the english_stemmer filter as a stemmer for English.

Define the synonym_filter with synonyms: quick,fast and jumps,leaps.

💡 Why This Matters

🌍 Real World

Search engines often need to understand different word forms and synonyms to give better results. This project shows how to set up such features in Elasticsearch.

💼 Career

Many jobs in search engineering, data engineering, and backend development require knowledge of text analysis and Elasticsearch configuration.

Progress0 / 4 steps

Create the index with basic settings

Create an Elasticsearch index called books with an empty settings object.

Elasticsearch

{
  "settings": {
    
  }
}

# Your code here

Need a hint?

Use the PUT method to create the index books with empty settings.

Add the custom analyzer with tokenizer and filters

Inside the settings, add an analysis section. Define a custom analyzer named custom_analyzer that uses the standard tokenizer and the token filters lowercase, english_stemmer, and synonym_filter in that order.

Elasticsearch

PUT /books
{
  "settings": {
    "analysis": {
      "analyzer": {
        "custom_analyzer": {
          "tokenizer": "standard",
          "filter": ["lowercase", "english_stemmer", "synonym_filter"]
        }
      }
    }
  }
}

# Your code here

Need a hint?

Remember to put the analysis section inside settings. Define the analyzer with the exact name custom_analyzer.

Define the stemmer and synonym token filters

Add the token filters english_stemmer and synonym_filter inside the analysis section. Define english_stemmer as a stemmer filter with stemmer type and english language. Define synonym_filter as a synonym filter with synonyms quick,fast and jumps,leaps.

Elasticsearch

PUT /books
{
  "settings": {
    "analysis": {
      "analyzer": {
        "custom_analyzer": {
          "tokenizer": "standard",
          "filter": ["lowercase", "english_stemmer", "synonym_filter"]
        }
      },
      "filter": {
        "english_stemmer": {
          "type": "stemmer",
          "language": "english"
        },
        "synonym_filter": {
          "type": "synonym",
          "synonyms": ["quick,fast", "jumps,leaps"]
        }
      }
    }
  }
}

# Your code here

Need a hint?

Define the filters inside the filter section under analysis. Use the exact names english_stemmer and synonym_filter.

Test the analyzer with a sample text

Use the _analyze API to test the custom_analyzer on the text "The quick brown fox jumps running fast". Print the tokens produced by the analyzer.

Elasticsearch

POST /books/_analyze
{
  "analyzer": "custom_analyzer",
  "text": "The quick brown fox jumps running fast"
}

# Your code here

Need a hint?

Use the POST /books/_analyze API with the custom_analyzer and the given text. The output tokens should include stemmed and synonym words.