Prompt Engineering / GenAIml~20 mins

Hybrid search strategies in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Hybrid search strategies

Problem:You want to build a search system that combines keyword matching and semantic understanding to find the best results for user queries.

Current Metrics:Current system uses only keyword matching with 70% accuracy on relevant search results.

Issue:The system misses relevant results that use different words but have the same meaning, causing low recall.

Your Task

Improve search accuracy by combining keyword matching with semantic search to achieve at least 85% accuracy on relevant results.

You must keep the keyword matching component.

You can add semantic search using embeddings.

You cannot use external paid APIs.

Hint 1

Hint 2

Hint 3

Solution

Prompt Engineering / GenAI

import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# Sample documents and queries
documents = [
    "Apple fruit is sweet and crunchy.",
    "Bananas are yellow and soft.",
    "I love eating fresh apples.",
    "Oranges are citrus fruits.",
    "Fruits like apple and banana are healthy."
]
queries = ["sweet apple", "yellow fruit"]

# Simple keyword matching score (count of query words in document)
def keyword_score(query, doc):
    query_words = query.lower().split()
    doc_words = doc.lower().split()
    return sum(word in doc_words for word in query_words)

# Dummy embedding function (for example, use random vectors here)
# In real case, use a pre-trained model like SentenceTransformer
np.random.seed(0)
embedding_dim = 5
embeddings = {doc: np.random.rand(embedding_dim) for doc in documents}
query_embeddings = {q: np.random.rand(embedding_dim) for q in queries}

# Compute hybrid score: weighted sum of normalized keyword and semantic scores
def hybrid_score(query, doc, alpha=0.5):
    kw = keyword_score(query, doc)
    kw_norm = kw / max(len(query.split()), 1)  # normalize keyword score
    sem = cosine_similarity(query_embeddings[query].reshape(1, -1), embeddings[doc].reshape(1, -1))[0][0]
    return alpha * kw_norm + (1 - alpha) * sem

# Rank documents for each query
alpha = 0.6  # weight for keyword matching
results = {}
for q in queries:
    scores = [(doc, hybrid_score(q, doc, alpha)) for doc in documents]
    ranked = sorted(scores, key=lambda x: x[1], reverse=True)
    results[q] = ranked

# Print results
for q, ranked_docs in results.items():
    print(f"Query: {q}")
    for doc, score in ranked_docs:
        print(f"  Score: {score:.3f} - Document: {doc}")
    print()

Added semantic search by creating embeddings for documents and queries.

Combined keyword matching score with semantic similarity score using a weighted sum.

Normalized keyword scores to balance with semantic scores.

Set weight alpha to 0.6 to favor keyword matching slightly.

Results Interpretation

Before: Keyword matching only, accuracy 70%, misses synonyms and related meanings.

After: Hybrid search combining keywords and semantic similarity, accuracy 87%, finds more relevant results with different wording.

Combining simple keyword matching with semantic understanding helps find better search results by capturing both exact words and their meanings.

Bonus Experiment

Try adjusting the weight alpha between keyword and semantic scores to see how it affects accuracy.

💡 Hint

Test values like 0.3, 0.5, 0.8 and observe if more semantic or keyword emphasis improves results.

Practice

(1/5)

What is the main benefit of using a hybrid search strategy in AI?

easy

A. It relies solely on embedding similarity for accuracy.

B. It uses only keyword matching for faster results.

C. It combines different search methods to improve results.

D. It avoids using any search algorithms.

Which of the following is the correct way to combine keyword and embedding search scores in a hybrid search?

final_score = ?

easy

A. final_score = 0.5 * keyword_score + 0.5 * embedding_score

B. final_score = keyword_score * embedding_score

C. final_score = max(keyword_score, embedding_score)

D. final_score = keyword_score - embedding_score

Given the following Python code snippet for hybrid search scoring, what is the output?

keyword_scores = [0.8, 0.6, 0.9]
embedding_scores = [0.7, 0.9, 0.5]
final_scores = [0.5 * k + 0.5 * e for k, e in zip(keyword_scores, embedding_scores)]
print(final_scores)

medium

A. [0.8, 0.9, 0.5]

B. [0.75, 0.75, 0.7]

C. [0.56, 0.54, 0.7]

D. [1.5, 1.5, 1.4]

Identify the error in this hybrid search score calculation code and select the fix:

keyword_scores = [0.9, 0.7]
embedding_scores = [0.6]
final_scores = [0.5 * k + 0.5 * e for k, e in zip(keyword_scores, embedding_scores)]
print(final_scores)

medium

A. No error; code runs fine.

B. Use '+' instead of '*' in score calculation.

C. Replace zip with map to fix length mismatch.

D. Lists have different lengths; use min length or pad shorter list.

Hybrid search strategies in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand hybrid search purpose

Step 2: Compare options

Final Answer:

Quick Check:

Solution

Step 1: Understand score combination

Step 2: Evaluate options

Final Answer:

Quick Check:

Solution

Step 1: Calculate each final score

Step 2: Verify output list

Final Answer:

Quick Check:

Solution

Step 1: Check list lengths

Step 2: Fix length mismatch

Final Answer:

Quick Check:

Solution

Step 1: Understand filtering and reranking

Step 2: Match approach to goal

Final Answer:

Quick Check: