Character filters in Elasticsearch - Time & Space Complexity
When using character filters in Elasticsearch, it's important to know how the processing time changes as the input text grows.
We want to understand how the time to apply character filters scales with the size of the text.
Analyze the time complexity of the following character filter configuration and usage.
PUT /my_index
{
"settings": {
"analysis": {
"char_filter": {
"my_mapping": {
"type": "mapping",
"mappings": ["ph=>f", "qu=>q"]
}
},
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"char_filter": ["my_mapping"]
}
}
}
}
}
This code sets up a character filter that replaces certain letter pairs before tokenizing the text.
Look at what happens when the filter runs on input text.
- Primary operation: Scanning the entire input text to find and replace matching character sequences.
- How many times: Once per character in the input, checking if a mapping applies.
As the input text gets longer, the filter must check more characters for replacements.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 checks |
| 100 | About 100 checks |
| 1000 | About 1000 checks |
Pattern observation: The work grows directly with the length of the input text.
Time Complexity: O(n)
This means the time to apply character filters grows linearly with the input size.
[X] Wrong: "Character filters run in constant time no matter how long the text is."
[OK] Correct: The filter must check each character or character pair to apply replacements, so longer text means more work.
Understanding how text processing scales helps you explain performance in search systems and text analysis tasks.
What if the character filter used multiple mappings with overlapping sequences? How would that affect the time complexity?