0
0
ElasticsearchConceptBeginner · 3 min read

What is Stemming in Elasticsearch: Explanation and Example

In Elasticsearch, stemming is a process that reduces words to their root form to improve search matching. It helps find related words by cutting off suffixes, so a search for "running" also matches "run" or "runs".
⚙️

How It Works

Stemming in Elasticsearch works like a smart shortcut that trims words to their basic form, called the root or stem. Imagine you have different versions of a word, like "run", "running", and "runner". Stemming cuts these down to the common root "run" so they are treated as the same word during searches.

This is useful because people often search using different word forms. Stemming helps Elasticsearch match these variations without needing to list every form. It uses algorithms called stemmers, such as the Porter stemmer, to decide how to trim words.

Think of it like grouping all your shoes by type instead of color. You don’t care if they are red or blue; you just want to find all shoes. Stemming groups word forms so your search finds all related matches.

💻

Example

This example shows how to use the English stemmer in an Elasticsearch analyzer to index and search words with different forms.

json
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_english_analyzer": {
          "tokenizer": "standard",
          "filter": ["lowercase", "english_stemmer"]
        }
      },
      "filter": {
        "english_stemmer": {
          "type": "stemmer",
          "language": "english"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "analyzer": "my_english_analyzer"
      }
    }
  }
}

POST /my_index/_analyze
{
  "analyzer": "my_english_analyzer",
  "text": "running runs runner"
}
Output
{ "tokens": [ {"token": "run", "start_offset": 0, "end_offset": 7}, {"token": "run", "start_offset": 8, "end_offset": 12}, {"token": "run", "start_offset": 13, "end_offset": 19} ] }
🎯

When to Use

Use stemming in Elasticsearch when you want your search to find different forms of a word without listing them all. It is great for full-text search on articles, blogs, or product descriptions where users may type variations of a word.

For example, if you run an online store, stemming helps a search for "running shoes" also find "runner shoes" or "runs shoes". It improves search experience by making it more flexible and user-friendly.

However, avoid stemming if exact word forms matter, like in legal or medical documents, where "runs" and "run" might have different meanings.

Key Points

  • Stemming reduces words to their root form to improve search matching.
  • It helps find related word forms like "run", "running", and "runs".
  • Elasticsearch uses stemmer filters like the English stemmer for this.
  • Use stemming for flexible, user-friendly full-text search.
  • Avoid stemming when exact word forms are critical.

Key Takeaways

Stemming in Elasticsearch helps match different forms of a word by reducing them to a common root.
It improves search flexibility by grouping related word variations together.
Use stemming for general full-text search to catch more relevant results.
Configure stemming with stemmer filters in your analyzer settings.
Avoid stemming when precise word forms are important for your search context.