Standard analyzer in Elasticsearch - Time & Space Complexity
We want to understand how the time it takes to analyze text grows as the text gets longer when using the standard analyzer in Elasticsearch.
Specifically, how does processing more words affect the work done?
Analyze the time complexity of the following Elasticsearch standard analyzer usage.
POST _analyze
{
"analyzer": "standard",
"text": "The quick brown fox jumps over the lazy dog"
}
This code sends a text to Elasticsearch to break it into tokens using the standard analyzer, which splits text into words and lowercases them.
Look at what repeats as the input grows.
- Primary operation: Tokenizing each word in the input text.
- How many times: Once for each word in the text.
As the number of words increases, the analyzer processes each word once.
| Input Size (n words) | Approx. Operations |
|---|---|
| 10 | About 10 token operations |
| 100 | About 100 token operations |
| 1000 | About 1000 token operations |
Pattern observation: The work grows directly with the number of words, so doubling words doubles the work.
Time Complexity: O(n)
This means the time to analyze text grows in a straight line with the number of words.
[X] Wrong: "The standard analyzer processes the whole text in one step, so time does not depend on text length."
[OK] Correct: The analyzer actually looks at each word separately, so more words mean more work.
Understanding how text analysis time grows helps you explain performance in search systems and shows you can think about scaling real data.
What if the analyzer also applied complex stemming rules to each word? How would the time complexity change?