Elasticsearchquery~5 mins

TF-IDF and BM25 scoring in Elasticsearch - Time & Space Complexity

Choose your learning style9 modes available

Time Complexity: TF-IDF and BM25 scoring

O(n * m)

Understanding Time Complexity

When searching documents, Elasticsearch uses scoring methods like TF-IDF and BM25 to rank results.

We want to understand how the time to calculate these scores grows as the number of documents or terms increases.

Scenario Under Consideration

Analyze the time complexity of this Elasticsearch query using BM25 scoring.


GET /my_index/_search
{
  "query": {
    "match": {
      "content": "search terms here"
    }
  }
}

This query searches documents matching the given terms and scores them using BM25 by default.

Identify Repeating Operations

Look at what repeats during scoring:

Primary operation: Calculating term frequency and inverse document frequency for each term in each document.
How many times: For each query term, Elasticsearch checks every document containing that term.

How Execution Grows With Input

As the number of documents and query terms grow, the scoring work increases.

Pattern observation: The work grows roughly with the number of documents times the number of query terms.

Final Time Complexity

Time Complexity: O(n * m)

This means the scoring time grows proportionally with the number of documents n and the number of query terms m.

Common Mistake

[X] Wrong: "Scoring time depends only on the number of query terms, not on documents."

[OK] Correct: Each term must be checked in every document that contains it, so more documents mean more scoring work.

Interview Connect

Understanding how scoring scales helps you explain search performance and design efficient queries in real projects.

Self-Check

What if we changed the scoring method from BM25 to a simpler constant score? How would the time complexity change?