Challenge - 5 Problems

🎖️

Tokenizer Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of Standard Tokenizer on a Sample Text

Given the following Elasticsearch analyzer configuration using the standard tokenizer, what is the output tokens for the input text "Hello, world! This is Elasticsearch."?

Elasticsearch

{
  "analyzer": {
    "my_standard_analyzer": {
      "tokenizer": "standard"
    }
  }
}

Input text: "Hello, world! This is Elasticsearch."

A["hello,", "world!", "this", "is", "elasticsearch."]

B["Hello,", "world!", "This", "is", "Elasticsearch."]

C["Hello", "world", "This", "is", "Elasticsearch"]

D["hello", "world", "this", "is", "elasticsearch"]

Attempts:

2 left

❓ Predict Output

intermediate

2:00remaining

Whitespace Tokenizer Output for a Given Text

What tokens does the whitespace tokenizer produce for the input text "Quick brown fox jumps over the lazy dog."?

Elasticsearch

{
  "analyzer": {
    "my_whitespace_analyzer": {
      "tokenizer": "whitespace"
    }
  }
}

Input text: "Quick brown fox jumps over the lazy dog."

A]".god" ,"yzal" ,"eht" ,"revo" ,"spmuj" ,"xof" ,"nworb" ,"kciuQ"[

B["Quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog."]

C["Quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"]

D["quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"]

Attempts:

2 left

❓ Predict Output

advanced

2:00remaining

Pattern Tokenizer with Custom Regex Output

Using the pattern tokenizer with the regex "\\W+" (non-word characters as delimiters), what tokens are produced from the input text "Email me at user@example.com!"?

Elasticsearch

{
  "analyzer": {
    "my_pattern_analyzer": {
      "filter": ["lowercase"],
      "tokenizer": {
        "type": "pattern",
        "pattern": "\\W+"
      }
    }
  }
}

Input text: "Email me at user@example.com!"

A["email", "me", "at", "user", "example", "com"]

B["Email", "me", "at", "user@example.com"]

C["email", "me", "at", "user@example.com"]

D["Email", "me", "at", "user", "example", "com"]

Attempts:

2 left

❓ Predict Output

advanced

2:00remaining

Effect of Pattern Tokenizer with Complex Regex

What tokens result from using a pattern tokenizer with the regex "[ ,.!]+" on the input text "Hello, world! Welcome to Elasticsearch."?

Elasticsearch

{
  "analyzer": {
    "complex_pattern_analyzer": {
      "tokenizer": {
        "type": "pattern",
        "pattern": "[ ,.!]+"
      }
    }
  }
}

Input text: "Hello, world! Welcome to Elasticsearch."

A["Hello", "world", "Welcome", "to", "Elasticsearch"]

B["Hello", "world!", "Welcome", "to", "Elasticsearch."]

C["Hello", "world", "Welcome", "to", "Elasticsearch."]

D["Hello,", "world", "Welcome", "to", "Elasticsearch"]

Attempts:

2 left

🧠 Conceptual

expert

2:00remaining

Choosing the Correct Tokenizer for Case-Sensitive Search

You want to create an Elasticsearch analyzer that preserves the original casing of tokens and splits text only on whitespace. Which tokenizer should you use to achieve this?

AUse the standard tokenizer with a lowercase filter.

BUse the standard tokenizer without any filters.

CUse the pattern tokenizer with pattern "\\s+" and lowercase filter.

DUse the whitespace tokenizer without any filters.

Attempts:

2 left