Recall & Review
beginner
What is a character filter in Elasticsearch?
A character filter is a preprocessing step that modifies the text before tokenization. It changes or removes characters to prepare the text for analysis.
Click to reveal answer
beginner
Name two built-in character filters in Elasticsearch.
Two built-in character filters are
html_strip which removes HTML tags, and mapping which replaces characters or strings based on a mapping.Click to reveal answer
intermediate
How does the
mapping character filter work?The
mapping filter replaces specified characters or strings with other characters or strings before tokenization. For example, it can replace accented letters with plain letters.Click to reveal answer
beginner
Why use character filters before tokenization?
Character filters clean or normalize text by removing unwanted characters or replacing them. This helps tokenizers produce consistent tokens and improves search accuracy.
Click to reveal answer
intermediate
Show a simple example of a custom character filter using
mapping in Elasticsearch.Example:
{
"analysis": {
"char_filter": {
"my_mapping": {
"type": "mapping",
"mappings": ["æ=>ae", "œ=>oe"]
}
}
}
}
This replaces 'æ' with 'ae' and 'œ' with 'oe' before tokenization.Click to reveal answer
What is the main purpose of a character filter in Elasticsearch?
✗ Incorrect
Character filters change or clean text before it is split into tokens.
Which character filter removes HTML tags from text?
✗ Incorrect
The html_strip filter removes HTML tags before tokenization.
What does the mapping character filter do?
✗ Incorrect
Mapping replaces specified characters or strings with others before tokenization.
When are character filters applied in the analysis process?
✗ Incorrect
Character filters run before tokenization to prepare the text.
Which of these is NOT a function of character filters?
✗ Incorrect
Splitting text into tokens is done by tokenizers, not character filters.
Explain what character filters do in Elasticsearch and why they are important.
Think about how text is prepared before breaking it into words.
You got /4 concepts.
Describe how you would use a mapping character filter to replace special characters in text.
Consider replacing accented letters with plain letters.
You got /4 concepts.