Challenge - 5 Problems
Character Filter Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
What is the output of this character filter configuration?
Given the following Elasticsearch analyzer configuration using a mapping character filter, what will be the output tokens for the input text
"foo-bar baz"?Elasticsearch
{
"analysis": {
"char_filter": {
"my_mapping": {
"type": "mapping",
"mappings": ["- => ""]"]
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"char_filter": ["my_mapping"],
"tokenizer": "whitespace"
}
}
}
}Attempts:
2 left
💡 Hint
Think about what the mapping character filter does to the dash (-) character before tokenization.
✗ Incorrect
The mapping character filter replaces the dash (-) with an empty string, so "foo-bar" becomes "foobar" before the whitespace tokenizer splits the text. Thus, the tokens are ["foobar", "baz"].
🧠 Conceptual
intermediate1:30remaining
Which character filter type removes HTML tags from input text?
In Elasticsearch, which character filter type is designed to remove HTML tags from the input text before tokenization?
Attempts:
2 left
💡 Hint
Think about the filter that cleans HTML content.
✗ Incorrect
The
html_strip character filter removes HTML tags from the input text, cleaning it before tokenization.❓ Predict Output
advanced2:00remaining
What error does this character filter configuration cause?
Consider this Elasticsearch character filter configuration snippet. What error will Elasticsearch raise when trying to create this analyzer?
Elasticsearch
{
"analysis": {
"char_filter": {
"bad_filter": {
"type": "mapping",
"mappings": "- >"
}
},
"analyzer": {
"test_analyzer": {
"type": "custom",
"char_filter": ["bad_filter"],
"tokenizer": "standard"
}
}
}
}Attempts:
2 left
💡 Hint
Check the syntax of the mapping string.
✗ Incorrect
The mapping string "- >" is invalid because the mapping syntax requires '=>' to separate the source and target characters. This causes an ElasticsearchParseException.
🚀 Application
advanced1:30remaining
How many tokens are produced by this analyzer?
Given this analyzer configuration with a pattern_replace character filter that removes digits, how many tokens will be produced from the input
"abc123 def456 ghi789"?Elasticsearch
{
"analysis": {
"char_filter": {
"remove_digits": {
"type": "pattern_replace",
"pattern": "\\d",
"replacement": ""
}
},
"analyzer": {
"digit_remover": {
"type": "custom",
"char_filter": ["remove_digits"],
"tokenizer": "whitespace"
}
}
}
}Attempts:
2 left
💡 Hint
Digits are removed before tokenization, but spaces remain.
✗ Incorrect
The pattern_replace filter removes all digits, so the input becomes "abc def ghi". The whitespace tokenizer splits on spaces, producing 3 tokens.
🔧 Debug
expert2:30remaining
Why does this analyzer produce unexpected tokens?
An Elasticsearch analyzer uses this character filter configuration but produces tokens with underscores instead of spaces. What is the cause?
Elasticsearch
{
"analysis": {
"char_filter": {
"underscore_to_space": {
"type": "mapping",
"mappings": ["_ => \\u0020"]
}
},
"analyzer": {
"custom_analyzer": {
"type": "custom",
"char_filter": ["underscore_to_space"],
"tokenizer": "whitespace"
}
}
}
}Attempts:
2 left
💡 Hint
Check how JSON strings handle backslashes.
✗ Incorrect
In JSON, backslashes must be escaped. The mapping string "_ => \u0020" is interpreted literally, not as a space. It should be "_ => \\u0020" to represent the unicode space character correctly.