Token filters (lowercase, stemmer, synonym) in Elasticsearch - Time & Space Complexity
When Elasticsearch processes text, it changes words step-by-step using token filters like lowercase, stemmer, and synonym. Understanding how long this takes helps us know how fast searches and indexing happen.
We want to see how the time to process text grows as the text gets longer or more complex.
Analyze the time complexity of the following Elasticsearch analyzer using token filters.
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"tokenizer": "standard",
"filter": ["lowercase", "stemmer", "synonym"]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms": ["quick,fast"]
},
"stemmer": {
"type": "stemmer",
"name": "english"
}
}
}
}
}
This analyzer breaks text into words, then applies lowercase, stems words to their root, and replaces synonyms.
Look at what happens to each word (token) in the text.
- Primary operation: Each token passes through three filters one after another.
- How many times: Once per token in the input text.
As the number of words grows, the total work grows too, because each word is processed by all filters.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 30 (10 tokens x 3 filters) |
| 100 | 300 (100 tokens x 3 filters) |
| 1000 | 3000 (1000 tokens x 3 filters) |
Pattern observation: The total work grows directly with the number of tokens; doubling tokens doubles work.
Time Complexity: O(n)
This means the time to process text grows in a straight line with the number of words.
[X] Wrong: "Adding more filters multiplies the time complexity to something like O(n²)."
[OK] Correct: Each filter processes tokens one after another, so time grows linearly with tokens, not squared. More filters add a fixed number of steps per token, not nested loops.
Knowing how token filters affect processing time helps you explain search speed and indexing performance clearly. This skill shows you understand how text analysis scales with data size.
What if we added a filter that compares each token to every other token? How would the time complexity change?