Fuzzy matching in Elasticsearch - Time & Space Complexity
When using fuzzy matching in Elasticsearch, it is important to understand how the search time changes as the data grows.
We want to know how the search cost grows when we allow for small spelling mistakes or typos.
Analyze the time complexity of the following Elasticsearch fuzzy query.
{
"query": {
"fuzzy": {
"name": {
"value": "roam",
"fuzziness": "AUTO"
}
}
}
}
This query searches for documents where the "name" field matches "roam" with some allowed typos.
Fuzzy matching involves checking many possible variations of the search word.
- Primary operation: Comparing the search term against many indexed terms with small differences allowed.
- How many times: Once for each candidate term in the index that could match.
As the number of indexed terms grows, the number of comparisons grows roughly in proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 comparisons |
| 100 | About 100 comparisons |
| 1000 | About 1000 comparisons |
Pattern observation: The work grows roughly in a straight line as the number of terms increases.
Time Complexity: O(n)
This means the search time grows linearly with the number of terms to check for fuzzy matches.
[X] Wrong: "Fuzzy matching is always fast no matter how many terms are in the index."
[OK] Correct: Because fuzzy matching must compare against many terms, the time grows as the index grows, so it can slow down with large data.
Understanding how fuzzy matching scales helps you explain search performance and design better queries in real projects.
What if we changed the fuzziness level from "AUTO" to a fixed number like 2? How would the time complexity change?