How to Use edge_ngram for Autocomplete in Elasticsearch
Use the
edge_ngram tokenizer in your Elasticsearch index mapping to break words into prefixes for autocomplete. Define a custom analyzer with edge_ngram in the index analyzer and use a standard analyzer for search to match user input efficiently.Syntax
The edge_ngram tokenizer splits text into smaller parts starting from the beginning of the word, which helps autocomplete by matching prefixes.
Key parts:
- tokenizer: Defines how text is split;
edge_ngramcreates prefixes. - min_gram: Minimum length of generated tokens.
- max_gram: Maximum length of generated tokens.
- analyzer: Combines tokenizer and filters for indexing and searching.
json
{
"settings": {
"analysis": {
"tokenizer": {
"autocomplete_tokenizer": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20,
"token_chars": ["letter", "digit"]
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "autocomplete_tokenizer",
"filter": ["lowercase"]
},
"autocomplete_search": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search"
}
}
}
}Example
This example creates an index with edge_ngram for autocomplete on the name field. It then indexes sample documents and shows a search query that returns autocomplete suggestions.
json
PUT /autocomplete_example
{
"settings": {
"analysis": {
"tokenizer": {
"autocomplete_tokenizer": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20,
"token_chars": ["letter", "digit"]
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "autocomplete_tokenizer",
"filter": ["lowercase"]
},
"autocomplete_search": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search"
}
}
}
}
POST /autocomplete_example/_doc/1
{
"name": "Apple"
}
POST /autocomplete_example/_doc/2
{
"name": "Application"
}
POST /autocomplete_example/_doc/3
{
"name": "Banana"
}
GET /autocomplete_example/_search
{
"query": {
"match": {
"name": "app"
}
}
}Output
{
"hits": {
"total": {"value": 2, "relation": "eq"},
"hits": [
{"_source": {"name": "Apple"}},
{"_source": {"name": "Application"}}
]
}
}
Common Pitfalls
Common mistakes when using edge_ngram for autocomplete include:
- Using
edge_ngramon thesearch_analyzerwhich causes poor matching; it should only be on theindexanalyzer. - Setting
min_gramtoo high, missing short prefixes. - Not using lowercase filter, causing case-sensitive mismatches.
- Applying
edge_ngramtokenizer on fields that do not need prefix matching, which can increase index size unnecessarily.
Correct usage example:
json
{
"settings": {
"analysis": {
"tokenizer": {
"autocomplete_tokenizer": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20,
"token_chars": ["letter", "digit"]
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "autocomplete_tokenizer",
"filter": ["lowercase"]
},
"autocomplete_search": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search"
}
}
}
}Quick Reference
- edge_ngram tokenizer: Generates prefixes for autocomplete.
- min_gram: Smallest prefix length (usually 1).
- max_gram: Longest prefix length (depends on expected input length).
- index analyzer: Uses
edge_ngramtokenizer for prefix indexing. - search analyzer: Uses standard tokenizer for normal search input.
- lowercase filter: Ensures case-insensitive matching.
Key Takeaways
Use edge_ngram tokenizer only in the index analyzer to generate prefixes for autocomplete.
Set min_gram to 1 and max_gram to a reasonable length to cover expected input prefixes.
Use a standard tokenizer with lowercase filter for the search analyzer to match user input correctly.
Avoid applying edge_ngram on the search analyzer to prevent poor search results.
Always include lowercase filter to make autocomplete case-insensitive.