0
0
Elasticsearchquery~10 mins

Autocomplete with edge n-gram in Elasticsearch - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Autocomplete with edge n-gram
User types prefix
Elasticsearch receives query
Search edge n-gram tokens
Match tokens starting with prefix
Return autocomplete suggestions
User sees suggestions
When a user types a prefix, Elasticsearch searches tokens created by edge n-gram to find matching suggestions starting with that prefix.
Execution Sample
Elasticsearch
PUT /autocomplete_example
{
  "settings": {
    "analysis": {
      "analyzer": {
        "autocomplete_analyzer": {
          "tokenizer": "autocomplete_tokenizer"
        }
      },
      "tokenizer": {
        "autocomplete_tokenizer": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 10,
          "token_chars": ["letter"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "autocomplete_analyzer",
        "search_analyzer": "standard"
      }
    }
  }
}

GET /autocomplete_example/_search
{
  "query": {
    "match": {
      "name": "mic"
    }
  }
}
This code creates an index with an edge n-gram tokenizer for autocomplete on the 'name' field, then searches for prefix 'mic'.
Execution Table
StepActionInput TextTokens CreatedQuery TokensMatched TokensOutput Suggestions
1Index documentmichael["m", "mi", "mic", "mich", "micha", "michae", "michael"]---
2Index documentmichelle["m", "mi", "mic", "mich", "miche", "michel", "michell", "michelle"]---
3User types prefixmic--["mic"]-
4Search edge n-gram tokens--["mic"]["mic"]-
5Match tokens starting with prefix--["mic"]["mic"]-
6Return autocomplete suggestions----["michael", "michelle"]
7User sees suggestions----["michael", "michelle"]
💡 Autocomplete suggestions returned when prefix 'mic' matches edge n-gram tokens.
Variable Tracker
VariableStartAfter Step 1After Step 2After Step 3After Step 4After Step 6Final
Input Text-michaelmichellemicmicmicmic
Tokens Created-["m", "mi", "mic", "mich", "micha", "michae", "michael"]["m", "mi", "mic", "mich", "miche", "michel", "michell", "michelle"]----
Query Tokens---["mic"]["mic"]["mic"]["mic"]
Matched Tokens----["mic"]["mic"]["mic"]
Output Suggestions-----["michael", "michelle"]["michael", "michelle"]
Key Moments - 3 Insights
Why do we create multiple tokens like "m", "mi", "mic" for a single word?
Because edge n-gram tokenizer breaks words into prefixes of increasing length to match user input prefixes, as shown in execution_table rows 1 and 2.
Why is the search analyzer set to standard instead of autocomplete_analyzer?
To avoid breaking the search query into n-grams and instead search the full prefix typed by the user, matching the edge n-gram tokens indexed (see execution_table row 3).
What happens if the user types a prefix longer than max_gram?
Tokens are only created up to max_gram length, so longer prefixes won't match beyond that length, limiting autocomplete suggestions (max_gram is 10 here).
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at Step 1. What tokens are created for the word "michael"?
A["michael"]
B["m", "mi", "mic", "mich", "micha", "michae", "michael"]
C["mic", "mich", "micha"]
D["m", "mi", "michelle"]
💡 Hint
Check the Tokens Created column at Step 1 in execution_table.
At which step does Elasticsearch match the user query tokens with the indexed tokens?
AStep 3
BStep 6
CStep 4
DStep 2
💡 Hint
Look for the step where 'Matched Tokens' column is filled in execution_table.
If we change max_gram to 3, how would the tokens for "michelle" change at Step 2?
A["m", "mi", "mic"]
B["michelle"]
C["m", "mi", "mic", "mich"]
DNo tokens would be created
💡 Hint
Tokens are created up to max_gram length, see Tokens Created column at Step 2.
Concept Snapshot
Autocomplete with edge n-gram:
- Use edge_ngram tokenizer to create prefix tokens
- Index tokens like 'm', 'mi', 'mic' for words
- Search uses standard analyzer for full prefix
- Matches tokens starting with user input
- Returns suggestions starting with typed prefix
Full Transcript
This visual trace shows how Elasticsearch uses edge n-gram tokenizer to support autocomplete. When indexing, words like 'michael' are broken into prefix tokens such as 'm', 'mi', 'mic', etc. When a user types a prefix like 'mic', Elasticsearch searches for tokens starting with 'mic' and returns matching suggestions like 'michael' and 'michelle'. The search analyzer is standard to keep the query as a full prefix. The execution table tracks each step from indexing to returning suggestions, and the variable tracker shows how tokens and queries evolve. Key moments clarify why multiple tokens are created and why the search analyzer differs. The quiz tests understanding of token creation and matching steps.