Challenge - 5 Problems
Analyzer Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
What is the output tokens from this _analyze request?
Given the following Elasticsearch _analyze API request using the standard analyzer, what tokens will be produced?
Elasticsearch
{
"analyzer": "standard",
"text": "Quick Brown Foxes"
}Attempts:
2 left
💡 Hint
The standard analyzer lowercases and splits on whitespace and punctuation.
✗ Incorrect
The standard analyzer breaks text into tokens by whitespace and punctuation, then lowercases all tokens. So 'Quick Brown Foxes' becomes ['quick', 'brown', 'foxes'].
❓ Predict Output
intermediate2:00remaining
What tokens does the whitespace analyzer produce?
Using the _analyze API with the whitespace analyzer on the text "Hello, World! Welcome to Elasticsearch.", what tokens are returned?
Elasticsearch
{
"analyzer": "whitespace",
"text": "Hello, World! Welcome to Elasticsearch."
}Attempts:
2 left
💡 Hint
Whitespace analyzer splits only on whitespace and does not lowercase or remove punctuation.
✗ Incorrect
The whitespace analyzer splits text only on spaces and does not lowercase or remove punctuation, so tokens keep original casing and punctuation.
❓ Predict Output
advanced2:30remaining
What is the output tokens with a custom analyzer using lowercase and stop filters?
Given this _analyze request with a custom analyzer that uses the standard tokenizer, lowercase filter, and stop filter with stopwords ["the", "is"], what tokens are produced for the text "The quick brown fox is fast"?
Elasticsearch
{
"tokenizer": "standard",
"filter": ["lowercase", "stop"],
"text": "The quick brown fox is fast"
}Attempts:
2 left
💡 Hint
Stop filter removes common stopwords after lowercasing.
✗ Incorrect
The standard tokenizer splits words, lowercase filter lowercases all tokens, and stop filter removes 'the' and 'is'. So tokens are ['quick', 'brown', 'fox', 'fast'].
❓ Predict Output
advanced2:00remaining
What error does this _analyze request produce?
What error will Elasticsearch return when running this _analyze request with an invalid tokenizer name?
Elasticsearch
{
"tokenizer": "nonexistent_tokenizer",
"text": "Test text"
}Attempts:
2 left
💡 Hint
Elasticsearch returns a 400 error when a requested tokenizer is missing.
✗ Incorrect
If the tokenizer name is invalid, Elasticsearch returns a 400 Bad Request with a message indicating the tokenizer was not found.
🧠 Conceptual
expert3:00remaining
How many tokens are produced by this _analyze request with a pattern tokenizer?
Using the _analyze API with a pattern tokenizer that splits on commas and spaces (pattern: '[,\s]+') on the text "apple, banana orange,pear", how many tokens are produced?
Elasticsearch
{
"tokenizer": {
"type": "pattern",
"pattern": "[,\s]+"
},
"text": "apple, banana orange,pear"
}Attempts:
2 left
💡 Hint
Count tokens split by commas or spaces.
✗ Incorrect
The pattern splits on commas or spaces, so the text splits into ['apple', 'banana', 'orange', 'pear'], which is 4 tokens.