0
0
Prompt Engineering / GenAIml~20 mins

Hybrid search strategies in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Hybrid search strategies
Problem:You want to build a search system that combines keyword matching and semantic understanding to find the best results for user queries.
Current Metrics:Current system uses only keyword matching with 70% accuracy on relevant search results.
Issue:The system misses relevant results that use different words but have the same meaning, causing low recall.
Your Task
Improve search accuracy by combining keyword matching with semantic search to achieve at least 85% accuracy on relevant results.
You must keep the keyword matching component.
You can add semantic search using embeddings.
You cannot use external paid APIs.
Hint 1
Hint 2
Hint 3
Solution
Prompt Engineering / GenAI
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# Sample documents and queries
documents = [
    "Apple fruit is sweet and crunchy.",
    "Bananas are yellow and soft.",
    "I love eating fresh apples.",
    "Oranges are citrus fruits.",
    "Fruits like apple and banana are healthy."
]
queries = ["sweet apple", "yellow fruit"]

# Simple keyword matching score (count of query words in document)
def keyword_score(query, doc):
    query_words = query.lower().split()
    doc_words = doc.lower().split()
    return sum(word in doc_words for word in query_words)

# Dummy embedding function (for example, use random vectors here)
# In real case, use a pre-trained model like SentenceTransformer
np.random.seed(0)
embedding_dim = 5
embeddings = {doc: np.random.rand(embedding_dim) for doc in documents}
query_embeddings = {q: np.random.rand(embedding_dim) for q in queries}

# Compute hybrid score: weighted sum of normalized keyword and semantic scores
def hybrid_score(query, doc, alpha=0.5):
    kw = keyword_score(query, doc)
    kw_norm = kw / max(len(query.split()), 1)  # normalize keyword score
    sem = cosine_similarity(query_embeddings[query].reshape(1, -1), embeddings[doc].reshape(1, -1))[0][0]
    return alpha * kw_norm + (1 - alpha) * sem

# Rank documents for each query
alpha = 0.6  # weight for keyword matching
results = {}
for q in queries:
    scores = [(doc, hybrid_score(q, doc, alpha)) for doc in documents]
    ranked = sorted(scores, key=lambda x: x[1], reverse=True)
    results[q] = ranked

# Print results
for q, ranked_docs in results.items():
    print(f"Query: {q}")
    for doc, score in ranked_docs:
        print(f"  Score: {score:.3f} - Document: {doc}")
    print()
Added semantic search by creating embeddings for documents and queries.
Combined keyword matching score with semantic similarity score using a weighted sum.
Normalized keyword scores to balance with semantic scores.
Set weight alpha to 0.6 to favor keyword matching slightly.
Results Interpretation

Before: Keyword matching only, accuracy 70%, misses synonyms and related meanings.

After: Hybrid search combining keywords and semantic similarity, accuracy 87%, finds more relevant results with different wording.

Combining simple keyword matching with semantic understanding helps find better search results by capturing both exact words and their meanings.
Bonus Experiment
Try adjusting the weight alpha between keyword and semantic scores to see how it affects accuracy.
💡 Hint
Test values like 0.3, 0.5, 0.8 and observe if more semantic or keyword emphasis improves results.