0
0
ElasticsearchHow-ToBeginner · 4 min read

How to Create Custom Analyzer in Elasticsearch: Syntax and Example

To create a custom analyzer in Elasticsearch, define it in the index settings under analysis.analyzer with components like tokenizer and filter. Then apply this analyzer to fields in your mapping to control how text is processed during indexing and searching.
📐

Syntax

A custom analyzer in Elasticsearch is defined inside the settings.analysis.analyzer section of an index. It includes a tokenizer that breaks text into tokens and optional filter components that modify tokens (like lowercasing or removing stop words).

The main parts are:

  • tokenizer: Splits text into terms.
  • filter: Processes tokens (e.g., lowercase, stem).
  • char_filter (optional): Preprocesses text before tokenizing.
json
{
  "settings": {
    "analysis": {
      "analyzer": {
        "custom_analyzer_name": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "stop"]
        }
      }
    }
  }
}
💻

Example

This example creates an index with a custom analyzer named my_custom_analyzer that uses the standard tokenizer and applies lowercase and stop word filters. It then maps a field to use this analyzer.

json
PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "stop"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "analyzer": "my_custom_analyzer"
      }
    }
  }
}
Output
{ "acknowledged": true, "shards_acknowledged": true, "index": "my_index" }
⚠️

Common Pitfalls

Common mistakes when creating custom analyzers include:

  • Not specifying "type": "custom" for the analyzer.
  • Using filters or tokenizers that are not installed or misspelled.
  • Forgetting to apply the custom analyzer to the field mapping.
  • Trying to update an analyzer on an existing index without reindexing.

Always create or reindex the index after defining a new analyzer.

json
PUT /wrong_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "bad_analyzer": {
          "tokenizer": "standard"
          /* Missing type: "custom" */
        }
      }
    }
  }
}

-- Correct way --

PUT /correct_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "good_analyzer": {
          "type": "custom",
          "tokenizer": "standard"
        }
      }
    }
  }
}
📊

Quick Reference

ComponentDescriptionExample Values
tokenizerBreaks text into tokensstandard, whitespace, keyword
filterModifies tokenslowercase, stop, stemmer
char_filterPreprocesses text before tokenizinghtml_strip, mapping
typeAnalyzer type, must be 'custom' for custom analyzerscustom

Key Takeaways

Define custom analyzers in index settings under analysis.analyzer with type 'custom'.
Specify tokenizer and filters to control text processing.
Apply the custom analyzer to fields in the mapping to use it.
You must create or reindex the index to apply analyzer changes.
Common errors include missing 'type' or misspelling components.