Bird
Raised Fist0
Prompt Engineering / GenAIml~12 mins

Hybrid search (semantic + keyword) in Prompt Engineering / GenAI - Model Pipeline Trace

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Model Pipeline - Hybrid search (semantic + keyword)

This hybrid search pipeline combines keyword matching with semantic understanding to find the most relevant documents. It first filters documents by keywords, then ranks them by semantic similarity using a trained model.

Data Flow - 5 Stages
1Input Query
1 query stringUser inputs a search query1 query string
"best Italian restaurants near me"
2Keyword Filtering
N documents x textFilter documents containing query keywordsM documents x text (M ≤ N)
From 1000 docs, filter to 150 docs containing words like 'Italian', 'restaurants'
3Semantic Embedding
1 query string and M documents x textConvert query and documents to vector embeddings1 query vector (768 dims), M document vectors (768 dims each)
Query vector: [0.12, -0.05, ..., 0.33], Document vector: [0.10, -0.02, ..., 0.30]
4Similarity Scoring
1 query vector, M document vectorsCalculate cosine similarity between query and each document vectorM similarity scores (float between -1 and 1)
[0.85, 0.78, 0.65, ...]
5Ranking and Output
M documents with similarity scoresSort documents by similarity score descendingTop K documents ranked
Top 5 documents with scores: [(doc23, 0.85), (doc7, 0.83), ...]
Training Trace - Epoch by Epoch

Loss
0.7 |****
0.6 |*** 
0.5 |**  
0.4 |*   
0.3 |*   
0.2 |    
     1 2 3 4 5 Epochs
EpochLoss ↓Accuracy ↑Observation
10.650.60Model starts learning semantic relations, initial moderate accuracy
20.480.72Loss decreases, accuracy improves as embeddings better capture meaning
30.350.81Model converges, semantic similarity scores become more reliable
40.300.85Fine tuning improves ranking quality, loss stabilizes
50.280.87Final epoch shows best balance of loss and accuracy
Prediction Trace - 4 Layers
Layer 1: Keyword Filtering
Layer 2: Semantic Embedding
Layer 3: Similarity Scoring
Layer 4: Ranking
Model Quiz - 3 Questions
Test your understanding
What is the main purpose of the keyword filtering stage?
ATo reduce the number of documents before semantic comparison
BTo convert text into vectors
CTo calculate similarity scores
DTo rank documents by relevance
Key Insight
Combining keyword filtering with semantic similarity balances speed and understanding, enabling efficient and meaningful search results.

Practice

(1/5)
1. What is the main advantage of hybrid search combining semantic and keyword methods?
easy
A. It improves search relevance by using both exact words and meaning.
B. It only uses exact keyword matching for faster results.
C. It ignores word meanings to focus on keyword frequency.
D. It replaces keywords with random words for variety.

Solution

  1. Step 1: Understand keyword and semantic search roles

    Keyword search finds exact word matches; semantic search finds meaning matches.
  2. Step 2: Combine both for better results

    Hybrid search uses both to improve relevance and user satisfaction.
  3. Final Answer:

    It improves search relevance by using both exact words and meaning. -> Option A
  4. Quick Check:

    Hybrid search = better relevance [OK]
Hint: Hybrid = exact words + meaning for best results [OK]
Common Mistakes:
  • Thinking hybrid search uses only keywords
  • Assuming semantic search ignores keywords
  • Believing hybrid search slows down search always
2. Which of the following is the correct way to combine semantic and keyword scores in hybrid search?
easy
A. final_score = semantic_score * keyword_score
B. final_score = semantic_score / keyword_score
C. final_score = semantic_score - keyword_score
D. final_score = semantic_score + keyword_score

Solution

  1. Step 1: Understand score combination methods

    Adding scores balances contributions from both semantic and keyword parts.
  2. Step 2: Choose addition for hybrid scoring

    Adding semantic and keyword scores is common to combine relevance signals.
  3. Final Answer:

    final_score = semantic_score + keyword_score -> Option D
  4. Quick Check:

    Hybrid score = sum of semantic and keyword [OK]
Hint: Add scores to combine semantic and keyword relevance [OK]
Common Mistakes:
  • Multiplying scores causing very small or large values
  • Subtracting scores losing positive relevance
  • Dividing scores causing errors if denominator is zero
3. Given the code snippet:
semantic_scores = [0.8, 0.5, 0.3]
keyword_scores = [0.6, 0.7, 0.4]
final_scores = [s + k for s, k in zip(semantic_scores, keyword_scores)]
print(final_scores)

What is the output?
medium
A. [1.4, 1.2, 0.7]
B. [0.2, -0.2, -0.1]
C. [0.48, 0.35, 0.12]
D. [1.2, 1.4, 0.7]

Solution

  1. Step 1: Add corresponding semantic and keyword scores

    0.8+0.6=1.4, 0.5+0.7=1.2, 0.3+0.4=0.7
  2. Step 2: Create list of summed scores

    final_scores = [1.4, 1.2, 0.7]
  3. Final Answer:

    [1.4, 1.2, 0.7] -> Option A
  4. Quick Check:

    Sum pairs = [1.4, 1.2, 0.7] [OK]
Hint: Add pairs element-wise for final scores [OK]
Common Mistakes:
  • Multiplying instead of adding scores
  • Mixing order of scores in zip
  • Confusing subtraction with addition
4. Identify the error in this hybrid search scoring code:
semantic_scores = [0.9, 0.4, 0.7]
keyword_scores = [0.5, 0.6]
final_scores = [s + k for s, k in zip(semantic_scores, keyword_scores)]
print(final_scores)
medium
A. Adding scores should use multiplication instead.
B. Using zip causes a syntax error here.
C. Lists have different lengths causing missing scores.
D. The print statement is missing parentheses.

Solution

  1. Step 1: Check list lengths

    semantic_scores has 3 items; keyword_scores has 2 items.
  2. Step 2: Understand zip behavior

    zip stops at shortest list length, so last semantic score is ignored.
  3. Final Answer:

    Lists have different lengths causing missing scores. -> Option C
  4. Quick Check:

    Unequal list lengths truncate results [OK]
Hint: Ensure lists are same length before zipping [OK]
Common Mistakes:
  • Assuming zip pads shorter list automatically
  • Thinking zip causes syntax error
  • Believing multiplication is required for hybrid scores
5. You want to improve a hybrid search system by weighting semantic similarity twice as much as keyword matching. Which formula correctly applies this?
hard
A. final_score = semantic_score + 2 * keyword_score
B. final_score = 2 * semantic_score + keyword_score
C. final_score = semantic_score * keyword_score * 2
D. final_score = (semantic_score + keyword_score) / 2

Solution

  1. Step 1: Identify weighting requirement

    Semantic similarity should count double compared to keyword score.
  2. Step 2: Apply weights in formula

    Multiply semantic_score by 2, then add keyword_score.
  3. Final Answer:

    final_score = 2 * semantic_score + keyword_score -> Option B
  4. Quick Check:

    Semantic weighted double = 2 * semantic + keyword [OK]
Hint: Multiply semantic score by 2 before adding keyword [OK]
Common Mistakes:
  • Weighting keyword score instead of semantic
  • Multiplying all scores together
  • Dividing sum instead of weighting