Complete the code to define a standard tokenizer in Elasticsearch.
{
"settings": {
"analysis": {
"tokenizer": {
"my_tokenizer": {
"type": "[1]"
}
}
}
}
}The standard tokenizer is the default tokenizer in Elasticsearch that breaks text into terms on word boundaries.
Complete the code to define a whitespace tokenizer in Elasticsearch.
{
"settings": {
"analysis": {
"tokenizer": {
"my_ws_tokenizer": {
"type": "[1]"
}
}
}
}
}The whitespace tokenizer splits text only on whitespace characters, keeping punctuation attached to words.
Fix the error in the pattern tokenizer definition by completing the pattern field.
{
"settings": {
"analysis": {
"tokenizer": {
"my_pattern_tokenizer": {
"type": "pattern",
"pattern": "[1]"
}
}
}
}
}The pattern tokenizer uses a regular expression to split text. The pattern \s+ matches one or more whitespace characters, splitting text on spaces.
Fill both blanks to define a pattern tokenizer that splits on commas and spaces.
{
"settings": {
"analysis": {
"tokenizer": {
"comma_space_tokenizer": {
"type": "[1]",
"pattern": "[2]"
}
}
}
}
}The tokenizer type must be pattern to use a regex pattern. The pattern \s*,\s* splits text on commas surrounded by optional spaces.
Fill all three blanks to define a custom analyzer using the standard tokenizer and lowercase filter.
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "[1]",
"filter": ["[2]", "[3]"]
}
}
}
}
}The custom analyzer uses the standard tokenizer. The filters include lowercase to convert tokens to lowercase and stop to remove common stop words.