Autocomplete with edge n-gram in Elasticsearch - Time & Space Complexity
When using autocomplete with edge n-grams in Elasticsearch, it's important to understand how the search time changes as the data grows.
We want to know how the number of operations grows when searching with edge n-grams.
Analyze the time complexity of this Elasticsearch autocomplete query using edge n-grams.
POST /products/_search
{
"query": {
"match": {
"name.edge_ngram": {
"query": "app",
"operator": "and"
}
}
}
}
This query searches the "name.edge_ngram" field for terms starting with "app" using edge n-grams for autocomplete.
Let's find the repeated steps in this search process.
- Primary operation: Matching the query prefix against all edge n-gram tokens in the index.
- How many times: Once per token generated by edge n-grams for each document field.
As the number of documents and tokens grows, the search compares the query prefix against more tokens.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 documents | ~100 token checks |
| 100 documents | ~1,000 token checks |
| 1000 documents | ~10,000 token checks |
Pattern observation: The number of token checks grows roughly linearly with the number of documents and tokens.
Time Complexity: O(n)
This means the search time grows linearly with the number of tokens generated by edge n-grams in the index.
[X] Wrong: "Autocomplete with edge n-grams always runs in constant time regardless of data size."
[OK] Correct: The search must check many tokens generated from all documents, so time grows with data size.
Understanding how autocomplete scales helps you explain search performance and design better user experiences.
What if we replaced edge n-grams with a prefix query on a keyword field? How would the time complexity change?