Token filters change words in text to help search work better. They make words lowercase, find word roots, or add similar words.
0
0
Token filters (lowercase, stemmer, synonym) in Elasticsearch
Introduction
You want to find words no matter if they are uppercase or lowercase.
You want to match different forms of a word like 'run', 'running', 'runs'.
You want to include similar words like 'car' and 'automobile' in search results.
You want to improve search by handling synonyms automatically.
You want to make search results more flexible and user-friendly.
Syntax
Elasticsearch
PUT /my_index
{
"settings": {
"analysis": {
"filter": {
"my_lowercase": {
"type": "lowercase"
},
"my_stemmer": {
"type": "stemmer",
"language": "english"
},
"my_synonym": {
"type": "synonym",
"synonyms": [
"car, automobile",
"quick, fast"
]
}
},
"analyzer": {
"my_custom_analyzer": {
"tokenizer": "standard",
"filter": ["my_lowercase", "my_stemmer", "my_synonym"]
}
}
}
}
}The lowercase filter makes all words small letters.
The stemmer filter cuts words to their root form.
The synonym filter adds words that mean the same.
Examples
This filter changes all tokens to lowercase.
Elasticsearch
"filter": { "lowercase_filter": { "type": "lowercase" } }
This filter reduces English words to their root form.
Elasticsearch
"filter": { "english_stemmer": { "type": "stemmer", "language": "english" } }
This filter treats listed words as synonyms during search.
Elasticsearch
"filter": { "synonym_filter": { "type": "synonym", "synonyms": ["fast, quick", "car, automobile"] } }
Sample Program
This example creates an index with lowercase, stemmer, and synonym filters. Then it analyzes the text 'Cars are running fast' to show how tokens change.
Elasticsearch
PUT /example_index
{
"settings": {
"analysis": {
"filter": {
"lowercase_filter": {
"type": "lowercase"
},
"english_stemmer": {
"type": "stemmer",
"language": "english"
},
"synonym_filter": {
"type": "synonym",
"synonyms": ["car, automobile", "fast, quick"]
}
},
"analyzer": {
"custom_analyzer": {
"tokenizer": "standard",
"filter": ["lowercase_filter", "english_stemmer", "synonym_filter"]
}
}
}
}
}
GET /example_index/_analyze
{
"analyzer": "custom_analyzer",
"text": "Cars are running fast"
}OutputSuccess
Important Notes
Order of filters matters: lowercase should come before stemmer and synonym.
Synonym filter can use external files for large synonym lists.
Stemming may reduce words too much, so test your filters carefully.
Summary
Token filters change words to improve search matching.
Lowercase filter makes all words small letters.
Stemmer filter finds root forms of words.
Synonym filter adds similar words to search.