Search performance tuning in Elasticsearch - Time & Space Complexity
Start learning this pattern below
Jump into concepts and practice - no test required
When tuning search performance in Elasticsearch, we want to understand how the time it takes to find results changes as the data grows.
We ask: How does search speed change when we add more documents or queries?
Analyze the time complexity of this Elasticsearch search query with filters and sorting.
GET /products/_search
{
"query": {
"bool": {
"filter": [
{ "term": { "category": "books" }},
{ "range": { "price": { "lte": 20 }}}
]
}
},
"sort": [ { "rating": "desc" } ]
}
This query filters products by category and price, then sorts results by rating.
Look for repeated work inside the search process.
- Primary operation: Scanning matching documents to apply filters and sorting.
- How many times: Once per matching document in the filtered set.
As the number of documents grows, the search engine checks more items to find matches and sort them.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 checks and sorts |
| 100 | About 100 checks and sorts |
| 1000 | About 1000 checks and sorts |
Pattern observation: The work grows roughly in direct proportion to the number of matching documents.
Time Complexity: O(n)
This means the search time grows linearly with the number of documents that match the filters.
[X] Wrong: "Adding filters always makes search faster because it reduces data to check."
[OK] Correct: Some filters can be slow if they are not indexed well, causing Elasticsearch to scan many documents anyway.
Understanding how search time grows helps you explain how to keep queries fast as data grows, a key skill in real projects.
"What if we added a full-text search instead of a term filter? How would the time complexity change?"
Practice
Solution
Step 1: Understand result limiting
Limiting results withsizereduces data processed and returned, speeding up queries.Step 2: Evaluate other options
Increasing shards without limit can hurt performance, disabling cache reduces speed, and wildcard queries are slow.Final Answer:
Limit the number of results returned usingsizeparameter -> Option AQuick Check:
Limiting results = faster search [OK]
size to limit results for faster queries [OK]- Thinking more shards always improve speed
- Ignoring caching benefits
- Using wildcard queries on all fields
title and author?Solution
Step 1: Identify correct field limiting syntax
Elasticsearch uses_sourceto specify which fields to return.Step 2: Check other options
fields,select, andreturn_fieldsare not valid for limiting returned fields in this context.Final Answer:
{"_source": ["title", "author"], "query": {"match_all": {}}} -> Option DQuick Check:
Use_sourceto limit fields [OK]
_source to specify returned fields [OK]- Using
fieldsinstead of_source - Trying SQL-like
selectsyntax - Using unsupported keys like
return_fields
"timeout": "2s"?
{
"query": {"match": {"content": "fast search"}},
"timeout": "2s"
}Solution
Step 1: Understand timeout behavior
Elasticsearch'stimeoutstops the query after the specified time and returns partial results if available.Step 2: Evaluate other options
It does not fail immediately, does not delay start, and does not control caching.Final Answer:
The query will return partial results after 2 seconds -> Option CQuick Check:
timeoutreturns partial results [OK]
- Assuming timeout causes query failure
- Thinking timeout delays query start
- Confusing timeout with caching duration
{
"size": 10,
"query": {
"_source": ["title", "date"],
"match_all": {}
}
}
But the query returns all fields. What is the likely mistake?Solution
Step 1: Check placement of
_source_sourcemust be at the top level of the query JSON, not insidequery.Step 2: Review other options
fieldsis deprecated for this purpose,sizeis correct, andmatch_alldoes not ignore field limits.Final Answer:
Using_sourceinside the query body instead of top-level -> Option BQuick Check:
_sourcemust be top-level [OK]
_source at top level, not inside query [OK]- Putting
_sourceinsidequery - Confusing
sizewithlimit - Assuming
match_allignores field filtering
id and summary fields, and must respond within 1 second. Which combination of settings best improves performance?Solution
Step 1: Limit results and fields
Settingsizelow reduces returned documents;_sourcelimits fields to needed ones.Step 2: Use timeout to keep response fast
Addingtimeoutof 1 second ensures query won't hang and keeps system responsive.Step 3: Evaluate other options
High size and disabling_sourceincrease load; wildcard queries are slow; increasing shards without need can hurt performance.Final Answer:
Setsizeto a low number, use_sourceto limit fields, and addtimeoutof 1s -> Option AQuick Check:
Limit size + fields + timeout = best performance [OK]
- Setting size too high
- Disabling field filtering
- Ignoring timeout setting
- Increasing shards unnecessarily
