0
0
Prompt Engineering / GenAIml~8 mins

Hybrid search (semantic + keyword) in Prompt Engineering / GenAI - Model Metrics & Evaluation

Choose your learning style9 modes available
Metrics & Evaluation - Hybrid search (semantic + keyword)
Which metric matters for Hybrid Search and WHY

Hybrid search combines semantic understanding and keyword matching to find the best results. The key metrics are Recall and Precision. Recall shows how many relevant results the search finds, important to not miss good answers. Precision shows how many found results are actually relevant, important to avoid noise. Since hybrid search balances meaning and exact words, both metrics help check if it finds enough good matches without too many wrong ones.

Confusion Matrix for Hybrid Search Results
      |---------------------------|
      |          | Predicted      |
      | Actual   | Relevant | Not |
      |----------|----------|-----|
      | Relevant |    TP    | FN  |
      | Not Rel. |    FP    | TN  |
      |---------------------------|

      TP = Correctly found relevant results
      FP = Found results that are not relevant
      FN = Relevant results missed by search
      TN = Correctly ignored irrelevant results
    

Metrics use these counts:
Precision = TP / (TP + FP)
Recall = TP / (TP + FN)

Precision vs Recall Tradeoff with Examples

In hybrid search, tuning for more semantic matching can increase recall by finding more relevant results even if keywords differ. But this may lower precision by including less exact matches. Tuning for strict keyword matching can increase precision by returning exact hits but lower recall by missing related results.

Example 1: A legal document search needs high precision to avoid irrelevant cases. So keyword matching is emphasized.

Example 2: A customer support search wants high recall to find all helpful answers, so semantic search is emphasized.

What Good vs Bad Metric Values Look Like

Good: Precision and recall both above 0.8 means the search finds most relevant results and keeps irrelevant ones low.

Bad: Precision below 0.5 means many irrelevant results confuse users. Recall below 0.5 means many relevant results are missed.

Balanced metrics around 0.7 are often acceptable depending on use case.

Common Metrics Pitfalls in Hybrid Search
  • Accuracy paradox: High accuracy can be misleading if most results are irrelevant and the model just returns few results.
  • Data leakage: Using test queries that appear in training can inflate metrics.
  • Overfitting: Tuning too much on keyword matching may miss semantic matches, hurting recall.
  • Ignoring user intent: Metrics alone don't capture if results satisfy user needs.
Self Check: Your model has 98% accuracy but 12% recall on relevant results. Is it good?

No, it is not good. The high accuracy likely means the model returns very few results, mostly irrelevant ones correctly ignored. But 12% recall means it misses 88% of relevant results, so users won't find what they need. Improving recall is critical for hybrid search usefulness.

Key Result
Hybrid search needs balanced precision and recall to find relevant results without too much noise.