Hybrid search combines semantic understanding and keyword matching to find the best results. The key metrics are Recall and Precision. Recall shows how many relevant results the search finds, important to not miss good answers. Precision shows how many found results are actually relevant, important to avoid noise. Since hybrid search balances meaning and exact words, both metrics help check if it finds enough good matches without too many wrong ones.
Hybrid search (semantic + keyword) in Prompt Engineering / GenAI - Model Metrics & Evaluation
Start learning this pattern below
Jump into concepts and practice - no test required
|---------------------------|
| | Predicted |
| Actual | Relevant | Not |
|----------|----------|-----|
| Relevant | TP | FN |
| Not Rel. | FP | TN |
|---------------------------|
TP = Correctly found relevant results
FP = Found results that are not relevant
FN = Relevant results missed by search
TN = Correctly ignored irrelevant results
Metrics use these counts:
Precision = TP / (TP + FP)
Recall = TP / (TP + FN)
In hybrid search, tuning for more semantic matching can increase recall by finding more relevant results even if keywords differ. But this may lower precision by including less exact matches. Tuning for strict keyword matching can increase precision by returning exact hits but lower recall by missing related results.
Example 1: A legal document search needs high precision to avoid irrelevant cases. So keyword matching is emphasized.
Example 2: A customer support search wants high recall to find all helpful answers, so semantic search is emphasized.
Good: Precision and recall both above 0.8 means the search finds most relevant results and keeps irrelevant ones low.
Bad: Precision below 0.5 means many irrelevant results confuse users. Recall below 0.5 means many relevant results are missed.
Balanced metrics around 0.7 are often acceptable depending on use case.
- Accuracy paradox: High accuracy can be misleading if most results are irrelevant and the model just returns few results.
- Data leakage: Using test queries that appear in training can inflate metrics.
- Overfitting: Tuning too much on keyword matching may miss semantic matches, hurting recall.
- Ignoring user intent: Metrics alone don't capture if results satisfy user needs.
No, it is not good. The high accuracy likely means the model returns very few results, mostly irrelevant ones correctly ignored. But 12% recall means it misses 88% of relevant results, so users won't find what they need. Improving recall is critical for hybrid search usefulness.
Practice
Solution
Step 1: Understand keyword and semantic search roles
Keyword search finds exact word matches; semantic search finds meaning matches.Step 2: Combine both for better results
Hybrid search uses both to improve relevance and user satisfaction.Final Answer:
It improves search relevance by using both exact words and meaning. -> Option AQuick Check:
Hybrid search = better relevance [OK]
- Thinking hybrid search uses only keywords
- Assuming semantic search ignores keywords
- Believing hybrid search slows down search always
Solution
Step 1: Understand score combination methods
Adding scores balances contributions from both semantic and keyword parts.Step 2: Choose addition for hybrid scoring
Adding semantic and keyword scores is common to combine relevance signals.Final Answer:
final_score = semantic_score + keyword_score -> Option DQuick Check:
Hybrid score = sum of semantic and keyword [OK]
- Multiplying scores causing very small or large values
- Subtracting scores losing positive relevance
- Dividing scores causing errors if denominator is zero
semantic_scores = [0.8, 0.5, 0.3]
keyword_scores = [0.6, 0.7, 0.4]
final_scores = [s + k for s, k in zip(semantic_scores, keyword_scores)]
print(final_scores)
What is the output?
Solution
Step 1: Add corresponding semantic and keyword scores
0.8+0.6=1.4, 0.5+0.7=1.2, 0.3+0.4=0.7Step 2: Create list of summed scores
final_scores = [1.4, 1.2, 0.7]Final Answer:
[1.4, 1.2, 0.7] -> Option AQuick Check:
Sum pairs = [1.4, 1.2, 0.7] [OK]
- Multiplying instead of adding scores
- Mixing order of scores in zip
- Confusing subtraction with addition
semantic_scores = [0.9, 0.4, 0.7]
keyword_scores = [0.5, 0.6]
final_scores = [s + k for s, k in zip(semantic_scores, keyword_scores)]
print(final_scores)
Solution
Step 1: Check list lengths
semantic_scores has 3 items; keyword_scores has 2 items.Step 2: Understand zip behavior
zip stops at shortest list length, so last semantic score is ignored.Final Answer:
Lists have different lengths causing missing scores. -> Option CQuick Check:
Unequal list lengths truncate results [OK]
- Assuming zip pads shorter list automatically
- Thinking zip causes syntax error
- Believing multiplication is required for hybrid scores
Solution
Step 1: Identify weighting requirement
Semantic similarity should count double compared to keyword score.Step 2: Apply weights in formula
Multiply semantic_score by 2, then add keyword_score.Final Answer:
final_score = 2 * semantic_score + keyword_score -> Option BQuick Check:
Semantic weighted double = 2 * semantic + keyword [OK]
- Weighting keyword score instead of semantic
- Multiplying all scores together
- Dividing sum instead of weighting
