Discover for data exploration in Elasticsearch - Time & Space Complexity
When using Discover in Elasticsearch for data exploration, it's important to understand how the time to get results changes as your data grows.
We want to know how the search and retrieval time changes when we explore more documents.
Analyze the time complexity of this Elasticsearch query used in Discover:
GET /my-index/_search
{
"query": {
"match_all": {}
},
"size": 50,
"sort": [
{"@timestamp": "desc"}
]
}
This query fetches the latest 50 documents from an index, sorting by timestamp.
Let's find the main repeated work:
- Primary operation: Elasticsearch scans the index to find matching documents.
- How many times: It checks many documents depending on index size and filters before returning 50 results.
As the number of documents grows, Elasticsearch needs to look through more data to find the latest 50.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | Checks about 10 documents |
| 100 | Checks about 100 documents |
| 1000 | Checks about 1000 documents |
Pattern observation: The work grows roughly in direct proportion to the number of documents in the index.
Time Complexity: O(n)
This means the time to get results grows linearly as the number of documents increases.
[X] Wrong: "Fetching a fixed number of results always takes the same time no matter how big the data is."
[OK] Correct: Even if you want only 50 results, Elasticsearch may need to scan many documents to find the latest ones, so time grows with data size.
Understanding how search time grows with data size helps you explain how Elasticsearch handles queries efficiently and what to expect as data grows.
What if we added a filter to the query to only look for documents with a specific field value? How would the time complexity change?