Challenge - 5 Problems
Elasticsearch Analyzer Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
What is the output of this Elasticsearch analyzer test?
Given the following analyzer configuration and input text, what is the list of tokens produced?
Elasticsearch
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": ["lowercase", "stop"]
}
},
"filter": {
"stop": {
"type": "stop",
"stopwords": ["the", "is"]
}
}
}
}
}
Input text: "The quick brown fox is jumping"Attempts:
2 left
💡 Hint
Remember that the stop filter removes specified stopwords after tokenizing and lowercasing.
✗ Incorrect
The standard tokenizer splits the text into tokens. The lowercase filter converts all tokens to lowercase. The stop filter removes the words "the" and "is". So the tokens "the" and "is" are removed, leaving only "quick", "brown", "fox", and "jumping".
❓ Predict Output
intermediate2:00remaining
What tokens result from this custom analyzer with a pattern tokenizer?
Given this analyzer configuration and input, what tokens are produced?
Elasticsearch
{
"settings": {
"analysis": {
"analyzer": {
"pattern_analyzer": {
"tokenizer": "pattern",
"filter": ["lowercase"]
}
},
"tokenizer": {
"pattern": {
"type": "pattern",
"pattern": "\\W+"
}
}
}
}
}
Input text: "Hello, World! Welcome to Elasticsearch."Attempts:
2 left
💡 Hint
The pattern tokenizer splits on non-word characters, and the lowercase filter converts tokens to lowercase.
✗ Incorrect
The pattern tokenizer splits the text on any sequence of non-word characters (like spaces, commas, punctuation). This produces tokens without punctuation. The lowercase filter converts all tokens to lowercase. No empty tokens remain because the pattern splits cleanly.
🔧 Debug
advanced2:00remaining
Why does this analyzer configuration cause an error?
This analyzer configuration causes an error when indexing. What is the cause?
Elasticsearch
{
"settings": {
"analysis": {
"analyzer": {
"bad_analyzer": {
"tokenizer": "standard",
"filter": ["lowercase", "nonexistent_filter"]
}
}
}
}
}Attempts:
2 left
💡 Hint
Check if all filters used are defined in the analysis settings.
✗ Incorrect
Elasticsearch requires that all filters used in an analyzer be defined or built-in. "nonexistent_filter" is not defined anywhere, so Elasticsearch throws a configuration error.
❓ Predict Output
advanced2:00remaining
What is the output tokens of this analyzer with a synonym filter?
Given this analyzer configuration and input, what tokens are produced?
Elasticsearch
{
"settings": {
"analysis": {
"analyzer": {
"synonym_analyzer": {
"tokenizer": "standard",
"filter": ["lowercase", "my_synonym"]
}
},
"filter": {
"my_synonym": {
"type": "synonym",
"synonyms": ["quick,fast"]
}
}
}
}
}
Input text: "The quick fox"Attempts:
2 left
💡 Hint
Synonym filters add synonyms as extra tokens, not replace tokens.
✗ Incorrect
The synonym filter adds "fast" as a synonym token for "quick". The lowercase filter converts all tokens to lowercase. The standard tokenizer splits the text into words. So tokens are "the", "quick", "fast", "fox".
🧠 Conceptual
expert3:00remaining
Which filter order produces this token output?
You want to produce tokens from the input "Running runs run" that are stemmed and lowercase, but stopwords are removed after stemming. Which filter order in the analyzer produces the tokens ["run", "run", "run"]?
Attempts:
2 left
💡 Hint
Think about when stopwords are removed relative to stemming and lowercasing.
✗ Incorrect
The input tokens are first lowercased, then stemmed to "run", then stopwords are removed. Since "run" is not a stopword, all tokens remain. If stop was before stemmer, "running" and "runs" would not be stemmed before stopword removal, possibly causing different results.