0
0
Elasticsearchquery~5 mins

Custom analyzers in Elasticsearch

Choose your learning style9 modes available
Introduction

Custom analyzers help you control how text is broken down and searched in Elasticsearch. They let you make search smarter and more accurate for your needs.

You want to ignore certain words like 'the' or 'and' in searches.
You need to make searches case-insensitive so 'Apple' and 'apple' match.
You want to break text into words differently, like splitting on hyphens.
You want to add special rules for stemming words to match similar forms.
You want to handle languages with special characters or accents properly.
Syntax
Elasticsearch
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "stop"]
        }
      }
    }
  }
}

The type must be custom to create your own analyzer.

You choose a tokenizer to split text into words.

Examples
This analyzer splits text normally and makes all words lowercase.
Elasticsearch
{
  "settings": {
    "analysis": {
      "analyzer": {
        "simple_lowercase": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase"]
        }
      }
    }
  }
}
This analyzer also removes common stop words like 'and', 'the'.
Elasticsearch
{
  "settings": {
    "analysis": {
      "analyzer": {
        "stopword_remover": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "stop"]
        }
      }
    }
  }
}
This analyzer creates partial word matches for autocomplete using edge n-grams.
Elasticsearch
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_edge_ngram": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "edge_ngram"]
        }
      },
      "filter": {
        "edge_ngram": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 5
        }
      }
    }
  }
}
Sample Program

This creates an index with a custom analyzer that lowercases text and removes stop words. Then it shows how the text is broken down by the analyzer.

Elasticsearch
PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "custom_lowercase_stop": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "stop"]
        }
      }
    }
  }
}

GET /my_index/_analyze
{
  "analyzer": "custom_lowercase_stop",
  "text": "The Quick Brown Fox jumps over the lazy Dog"
}
OutputSuccess
Important Notes

Custom analyzers let you mix and match tokenizers and filters to fit your search needs.

Stop words are common words that usually don't add meaning and can be removed to improve search speed.

Always test your analyzer with the _analyze API to see how text is processed.

Summary

Custom analyzers control how Elasticsearch breaks down and processes text.

You can add filters like lowercase and stop words to improve search results.

Testing analyzers helps you understand how your text is indexed and searched.