0
0
Elasticsearchquery~10 mins

Character filters in Elasticsearch - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Character filters
Input Text
Apply Character Filters
Modified Text
Tokenizer
Tokens for Analysis
Character filters take the input text and change characters before tokenizing, preparing text for analysis.
Execution Sample
Elasticsearch
{
  "settings": {
    "analysis": {
      "char_filter": {
        "my_filter": {
          "type": "mapping",
          "mappings": ["&=>and", "@=>at"]
        }
      },
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "char_filter": ["my_filter"],
          "tokenizer": "whitespace"
        }
      }
    }
  }
}
This config replaces '&' with 'and' and '@' with 'at' before splitting text by spaces.
Execution Table
StepInput TextCharacter Filter AppliedModified TextTokenizer Output
1rock & roll @ nightReplace '&' with 'and'rock and roll @ night
2rock and roll @ nightReplace '@' with 'at'rock and roll at night
3rock and roll at nightNo more filtersrock and roll at nightTokens: ['rock', 'and', 'roll', 'at', 'night']
4EndAll filters appliedFinal text ready for tokenizingTokenization complete
💡 All character filters applied; text is ready for tokenization.
Variable Tracker
VariableStartAfter Step 1After Step 2After Step 3Final
input_textrock & roll @ nightrock and roll @ nightrock and roll at nightrock and roll at nightrock and roll at night
modified_textrock & roll @ nightrock and roll @ nightrock and roll at nightrock and roll at nightrock and roll at night
tokens[][][]['rock', 'and', 'roll', 'at', 'night']['rock', 'and', 'roll', 'at', 'night']
Key Moments - 3 Insights
Why does the text change before tokenization?
Character filters modify the raw text first (see execution_table steps 1 and 2) so tokenization works on the cleaned text.
Are character filters applied after tokenization?
No, character filters run before tokenization (execution_table step 3), changing characters in the original text.
What happens if no character filters are defined?
The original text goes straight to tokenization without changes, so tokens reflect the raw input.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at step 2, what is the modified text after replacing '@'?
Arock & roll at night
Brock and roll @ night
Crock and roll at night
Drock & roll @ night
💡 Hint
Check the 'Modified Text' column at step 2 in the execution_table.
At which step does tokenization produce the final tokens?
AStep 2
BStep 3
CStep 1
DStep 4
💡 Hint
Look at the 'Tokenizer Output' column in the execution_table.
If we remove the character filter replacing '&', what would be the token for '&'?
A'&'
BNo token
C'and'
D'at'
💡 Hint
Refer to variable_tracker for 'tokens' after step 3 and consider what happens without the filter.
Concept Snapshot
Character filters modify input text before tokenizing.
They replace or remove characters.
Configured in analysis settings.
Run before tokenizer.
Help clean or normalize text.
Example: replace '&' with 'and'.
Full Transcript
Character filters in Elasticsearch change the input text before it is split into tokens. For example, they can replace symbols like '&' with words like 'and'. This happens before the tokenizer runs. The process starts with the original text, then each character filter applies its changes in order. After all filters run, the tokenizer splits the cleaned text into tokens. This helps search work better by normalizing text. If no character filters are used, the tokenizer works on the raw text. The example shows replacing '&' with 'and' and '@' with 'at', then splitting by spaces to get tokens.