0
0
Elasticsearchquery~20 mins

Analyzer components (tokenizer, filters) in Elasticsearch - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Elasticsearch Analyzer Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
What is the output of this Elasticsearch analyzer test?
Given the following analyzer configuration and input text, what is the list of tokens produced?
Elasticsearch
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "filter": ["lowercase", "stop"]
        }
      },
      "filter": {
        "stop": {
          "type": "stop",
          "stopwords": ["the", "is"]
        }
      }
    }
  }
}

Input text: "The quick brown fox is jumping"
A["quick", "brown", "fox", "is", "jumping"]
B["The", "quick", "brown", "fox", "jumping"]
C["quick", "brown", "fox", "jumping"]
D["the", "quick", "brown", "fox", "is", "jumping"]
Attempts:
2 left
💡 Hint
Remember that the stop filter removes specified stopwords after tokenizing and lowercasing.
Predict Output
intermediate
2:00remaining
What tokens result from this custom analyzer with a pattern tokenizer?
Given this analyzer configuration and input, what tokens are produced?
Elasticsearch
{
  "settings": {
    "analysis": {
      "analyzer": {
        "pattern_analyzer": {
          "tokenizer": "pattern",
          "filter": ["lowercase"]
        }
      },
      "tokenizer": {
        "pattern": {
          "type": "pattern",
          "pattern": "\\W+"
        }
      }
    }
  }
}

Input text: "Hello, World! Welcome to Elasticsearch."
A["hello", "world", "welcome", "to", "elasticsearch", ""]
B["hello", "world", "welcome", "to", "elasticsearch"]
C["Hello", "World", "Welcome", "to", "Elasticsearch", ""]
D["Hello", "World", "Welcome", "to", "Elasticsearch"]
Attempts:
2 left
💡 Hint
The pattern tokenizer splits on non-word characters, and the lowercase filter converts tokens to lowercase.
🔧 Debug
advanced
2:00remaining
Why does this analyzer configuration cause an error?
This analyzer configuration causes an error when indexing. What is the cause?
Elasticsearch
{
  "settings": {
    "analysis": {
      "analyzer": {
        "bad_analyzer": {
          "tokenizer": "standard",
          "filter": ["lowercase", "nonexistent_filter"]
        }
      }
    }
  }
}
AThe filter "nonexistent_filter" is not defined, causing a configuration error.
BThe standard tokenizer is deprecated and causes errors.
CThe lowercase filter must be defined explicitly in the filter section.
DThe analyzer must have at least two tokenizers defined.
Attempts:
2 left
💡 Hint
Check if all filters used are defined in the analysis settings.
Predict Output
advanced
2:00remaining
What is the output tokens of this analyzer with a synonym filter?
Given this analyzer configuration and input, what tokens are produced?
Elasticsearch
{
  "settings": {
    "analysis": {
      "analyzer": {
        "synonym_analyzer": {
          "tokenizer": "standard",
          "filter": ["lowercase", "my_synonym"]
        }
      },
      "filter": {
        "my_synonym": {
          "type": "synonym",
          "synonyms": ["quick,fast"]
        }
      }
    }
  }
}

Input text: "The quick fox"
A["the", "quick", "fast", "fox"]
B["the", "quick", "fox"]
C["the", "fast", "fox"]
D["quick", "fast", "fox"]
Attempts:
2 left
💡 Hint
Synonym filters add synonyms as extra tokens, not replace tokens.
🧠 Conceptual
expert
3:00remaining
Which filter order produces this token output?
You want to produce tokens from the input "Running runs run" that are stemmed and lowercase, but stopwords are removed after stemming. Which filter order in the analyzer produces the tokens ["run", "run", "run"]?
Afilter: ["lowercase", "stop", "stemmer"]
Bfilter: ["stemmer", "lowercase", "stop"]
Cfilter: ["stop", "lowercase", "stemmer"]
Dfilter: ["lowercase", "stemmer", "stop"]
Attempts:
2 left
💡 Hint
Think about when stopwords are removed relative to stemming and lowercasing.