TF-IDF and BM25 scoring in Elasticsearch - Time & Space Complexity
When searching documents, Elasticsearch uses scoring methods like TF-IDF and BM25 to rank results.
We want to understand how the time to calculate these scores grows as the number of documents or terms increases.
Analyze the time complexity of this Elasticsearch query using BM25 scoring.
GET /my_index/_search
{
"query": {
"match": {
"content": "search terms here"
}
}
}
This query searches documents matching the given terms and scores them using BM25 by default.
Look at what repeats during scoring:
- Primary operation: Calculating term frequency and inverse document frequency for each term in each document.
- How many times: For each query term, Elasticsearch checks every document containing that term.
As the number of documents and query terms grow, the scoring work increases.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 documents, 2 terms | About 20 checks (2 terms x 10 docs) |
| 100 documents, 2 terms | About 200 checks |
| 1000 documents, 5 terms | About 5000 checks |
Pattern observation: The work grows roughly with the number of documents times the number of query terms.
Time Complexity: O(n * m)
This means the scoring time grows proportionally with the number of documents n and the number of query terms m.
[X] Wrong: "Scoring time depends only on the number of query terms, not on documents."
[OK] Correct: Each term must be checked in every document that contains it, so more documents mean more scoring work.
Understanding how scoring scales helps you explain search performance and design efficient queries in real projects.
What if we changed the scoring method from BM25 to a simpler constant score? How would the time complexity change?