0
0
Elasticsearchquery~10 mins

Token filters (lowercase, stemmer, synonym) in Elasticsearch - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Token filters (lowercase, stemmer, synonym)
Input Text
Tokenizer splits text
Lowercase filter: convert to lowercase
Stemmer filter: reduce words to root form
Synonym filter: replace words with synonyms
Output tokens for indexing/search
Text is split into tokens, then filters apply in order: lowercase makes all letters small, stemmer reduces words to roots, synonym replaces words with their synonyms.
Execution Sample
Elasticsearch
PUT /my_index
{
  "settings": {
    "analysis": {
      "filter": {
        "my_synonym": {
          "type": "synonym",
          "synonyms": ["quick,fast"]
        },
        "my_stemmer": {
          "type": "stemmer",
          "language": "english"
        },
        "my_lowercase": {
          "type": "lowercase"
        }
      },
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "filter": ["my_lowercase", "my_stemmer", "my_synonym"]
        }
      }
    }
  }
}

GET /my_index/_analyze
{
  "analyzer": "my_analyzer",
  "text": "Quickly running fast runners"
}
This code creates an index with a custom analyzer that lowercases, stems, and applies synonyms to the input text, then analyzes the sample text.
Execution Table
StepInput TokenFilter AppliedOutput TokenNotes
1QuicklyLowercasequicklyConvert to lowercase
2quicklyStemmerquickStem to root form
3quickSynonymquick, fastReplace with synonyms (expands to two tokens)
4runningLowercaserunningAlready lowercase
5runningStemmerrunStem to root form
6runSynonymrunNo synonym found
7fastLowercasefastAlready lowercase
8fastStemmerfastStemmer keeps 'fast'
9fastSynonymquick, fastSynonym expands to two tokens
10runnersLowercaserunnersAlready lowercase
11runnersStemmerrunnerStem to root form
12runnerSynonymrunnerNo synonym found
13End--All tokens processed
💡 All tokens processed through lowercase, stemmer, and synonym filters.
Variable Tracker
TokenOriginalAfter LowercaseAfter StemmerAfter Synonym
Token1Quicklyquicklyquickquick, fast
Token2runningrunningrunrun
Token3fastfastfastquick, fast
Token4runnersrunnersrunnerrunner
Key Moments - 3 Insights
Why does the token 'Quickly' become two tokens 'quick' and 'fast' after the synonym filter?
Because the synonym filter replaces 'quick' with both 'quick' and 'fast', expanding one token into two as shown in execution_table row 3.
Does the stemmer always shorten words to their root form?
Yes, the stemmer reduces words like 'running' to 'run' and 'runners' to 'runner' as seen in rows 5 and 11.
Why is the lowercase filter applied before the stemmer and synonym filters?
Lowercase filter ensures all tokens are lowercase so stemmer and synonym filters work consistently, as shown in the order of filters in the execution_table.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, what is the output token after the stemmer filter for 'running'?
Arun
Brunning
Crunner
Dran
💡 Hint
Check execution_table row 5 under 'Output Token' after 'Stemmer' filter.
At which step does the synonym filter expand a token into two tokens?
AStep 9
BStep 6
CStep 3
DStep 12
💡 Hint
Look for 'Synonym' filter steps that output two tokens in execution_table.
If the lowercase filter was removed, what would happen to the token 'Quickly'?
AIt would become 'quickly' anyway
BIt would remain 'Quickly' and might not match synonyms or stem correctly
CIt would be removed
DIt would become uppercase
💡 Hint
Refer to the importance of lowercase filter in key_moments and execution_table row 1.
Concept Snapshot
Token filters process tokens after splitting text.
Lowercase filter makes all letters small.
Stemmer reduces words to their root form.
Synonym filter replaces words with synonyms, possibly expanding tokens.
Filters apply in order and affect search indexing and matching.
Full Transcript
This visual execution shows how Elasticsearch token filters work step-by-step. First, the input text is split into tokens. Then, each token passes through the lowercase filter, which converts all letters to lowercase. Next, the stemmer filter reduces words to their root forms, like 'running' to 'run'. Finally, the synonym filter replaces tokens with their synonyms, sometimes expanding one token into multiple tokens, such as 'quick' becoming 'quick' and 'fast'. The execution table traces each token through these filters, showing how tokens change at each step. The variable tracker summarizes token states after each filter. Key moments clarify common confusions, like why synonyms expand tokens and why lowercase is applied first. The quiz tests understanding by asking about specific steps and effects of filters. This process helps Elasticsearch index and search text more effectively by normalizing and expanding tokens.