How to Create Custom Analyzer in Elasticsearch: Syntax and Example
To create a custom analyzer in Elasticsearch, define it in the index settings under
analysis.analyzer with components like tokenizer and filter. Then apply this analyzer to fields in your mapping to control how text is processed during indexing and searching.Syntax
A custom analyzer in Elasticsearch is defined inside the settings.analysis.analyzer section of an index. It includes a tokenizer that breaks text into tokens and optional filter components that modify tokens (like lowercasing or removing stop words).
The main parts are:
- tokenizer: Splits text into terms.
- filter: Processes tokens (e.g., lowercase, stem).
- char_filter (optional): Preprocesses text before tokenizing.
json
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer_name": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "stop"]
}
}
}
}
}Example
This example creates an index with a custom analyzer named my_custom_analyzer that uses the standard tokenizer and applies lowercase and stop word filters. It then maps a field to use this analyzer.
json
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "stop"]
}
}
}
},
"mappings": {
"properties": {
"content": {
"type": "text",
"analyzer": "my_custom_analyzer"
}
}
}
}Output
{
"acknowledged": true,
"shards_acknowledged": true,
"index": "my_index"
}
Common Pitfalls
Common mistakes when creating custom analyzers include:
- Not specifying
"type": "custom"for the analyzer. - Using filters or tokenizers that are not installed or misspelled.
- Forgetting to apply the custom analyzer to the field mapping.
- Trying to update an analyzer on an existing index without reindexing.
Always create or reindex the index after defining a new analyzer.
json
PUT /wrong_index
{
"settings": {
"analysis": {
"analyzer": {
"bad_analyzer": {
"tokenizer": "standard"
/* Missing type: "custom" */
}
}
}
}
}
-- Correct way --
PUT /correct_index
{
"settings": {
"analysis": {
"analyzer": {
"good_analyzer": {
"type": "custom",
"tokenizer": "standard"
}
}
}
}
}Quick Reference
| Component | Description | Example Values |
|---|---|---|
| tokenizer | Breaks text into tokens | standard, whitespace, keyword |
| filter | Modifies tokens | lowercase, stop, stemmer |
| char_filter | Preprocesses text before tokenizing | html_strip, mapping |
| type | Analyzer type, must be 'custom' for custom analyzers | custom |
Key Takeaways
Define custom analyzers in index settings under analysis.analyzer with type 'custom'.
Specify tokenizer and filters to control text processing.
Apply the custom analyzer to fields in the mapping to use it.
You must create or reindex the index to apply analyzer changes.
Common errors include missing 'type' or misspelling components.