Custom analyzers in Elasticsearch - Time & Space Complexity
When using custom analyzers in Elasticsearch, it is important to understand how the processing time changes as the input size grows.
We want to know how the time to analyze text scales when using custom analyzers.
Analyze the time complexity of the following Elasticsearch custom analyzer definition and usage.
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "stop"]
}
}
}
}
}
GET /my_index/_analyze
{
"analyzer": "my_custom_analyzer",
"text": "The Quick Brown Fox Jumps Over The Lazy Dog"
}
This code defines a custom analyzer that tokenizes text, converts tokens to lowercase, and removes stop words, then analyzes a sample text.
Look at the steps that repeat as the input text grows.
- Primary operation: Tokenizing and filtering each word in the input text.
- How many times: Once for each token (word) in the input text.
As the input text gets longer, the analyzer processes more tokens one by one.
| Input Size (n tokens) | Approx. Operations |
|---|---|
| 10 | About 10 token processing steps |
| 100 | About 100 token processing steps |
| 1000 | About 1000 token processing steps |
Pattern observation: The time grows roughly in direct proportion to the number of tokens.
Time Complexity: O(n)
This means the time to analyze text grows linearly with the number of words in the input.
[X] Wrong: "Custom analyzers process the entire text all at once, so time does not depend on input size."
[OK] Correct: Actually, analyzers work token by token, so more words mean more processing steps and longer time.
Understanding how custom analyzers scale helps you explain performance when handling large text data in search systems.
What if we added more filters to the custom analyzer? How would that affect the time complexity?