ElasticsearchHow-ToBeginner · 4 min read

How to Create Custom Analyzer in Elasticsearch: Syntax and Example

To create a custom analyzer in Elasticsearch, define it in the index settings under analysis.analyzer with components like tokenizer and filter. Then apply this analyzer to fields in your mapping to control how text is processed during indexing and searching.

📐

Syntax

A custom analyzer in Elasticsearch is defined inside the settings.analysis.analyzer section of an index. It includes a tokenizer that breaks text into tokens and optional filter components that modify tokens (like lowercasing or removing stop words).

The main parts are:

tokenizer: Splits text into terms.
filter: Processes tokens (e.g., lowercase, stem).
char_filter (optional): Preprocesses text before tokenizing.

json

{
  "settings": {
    "analysis": {
      "analyzer": {
        "custom_analyzer_name": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "stop"]
        }
      }
    }
  }
}

💻

Example

This example creates an index with a custom analyzer named my_custom_analyzer that uses the standard tokenizer and applies lowercase and stop word filters. It then maps a field to use this analyzer.

json

PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "stop"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "analyzer": "my_custom_analyzer"
      }
    }
  }
}

Output

{ "acknowledged": true, "shards_acknowledged": true, "index": "my_index" }

⚠️

Common Pitfalls

Common mistakes when creating custom analyzers include:

Not specifying "type": "custom" for the analyzer.
Using filters or tokenizers that are not installed or misspelled.
Forgetting to apply the custom analyzer to the field mapping.
Trying to update an analyzer on an existing index without reindexing.

Always create or reindex the index after defining a new analyzer.

json

PUT /wrong_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "bad_analyzer": {
          "tokenizer": "standard"
          /* Missing type: "custom" */
        }
      }
    }
  }
}

-- Correct way --

PUT /correct_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "good_analyzer": {
          "type": "custom",
          "tokenizer": "standard"
        }
      }
    }
  }
}

📊

Quick Reference

Component	Description	Example Values
tokenizer	Breaks text into tokens	standard, whitespace, keyword
filter	Modifies tokens	lowercase, stop, stemmer
char_filter	Preprocesses text before tokenizing	html_strip, mapping
type	Analyzer type, must be 'custom' for custom analyzers	custom

✅

Key Takeaways

Define custom analyzers in index settings under analysis.analyzer with type 'custom'.

Specify tokenizer and filters to control text processing.

Apply the custom analyzer to fields in the mapping to use it.

You must create or reindex the index to apply analyzer changes.

Common errors include missing 'type' or misspelling components.