Retrieval strategies help find the most useful information from many options. They pick answers that best match what you want.
Retrieval strategies (similarity, MMR, hybrid) in Agentic AI
Start learning this pattern below
Jump into concepts and practice - no test required
class RetrievalStrategy: def retrieve(self, query, documents): pass class SimilarityRetrieval(RetrievalStrategy): def retrieve(self, query, documents): # Return documents ranked by similarity to query pass class MMRRetrieval(RetrievalStrategy): def retrieve(self, query, documents, lambda_param=0.5): # Return documents balancing similarity and diversity pass class HybridRetrieval(RetrievalStrategy): def __init__(self, strategies): self.strategies = strategies def retrieve(self, query, documents): # Combine results from multiple strategies pass
Similarity retrieval ranks items by how close they are to the query.
MMR (Maximal Marginal Relevance) balances relevance and variety to avoid repeats.
similarity_strategy = SimilarityRetrieval()
results = similarity_strategy.retrieve('apple', documents)mmr_strategy = MMRRetrieval() results = mmr_strategy.retrieve('apple', documents, lambda_param=0.7)
hybrid_strategy = HybridRetrieval([SimilarityRetrieval(), MMRRetrieval()])
results = hybrid_strategy.retrieve('apple', documents)This program shows three retrieval strategies: similarity, MMR, and hybrid. It uses simple vectors to represent documents and a query. It prints the ranking and scores for each strategy.
import numpy as np def cosine_similarity(vec1, vec2): dot_product = np.dot(vec1, vec2) norm1 = np.linalg.norm(vec1) norm2 = np.linalg.norm(vec2) if norm1 == 0 or norm2 == 0: return 0.0 return dot_product / (norm1 * norm2) class RetrievalStrategy: def retrieve(self, query_vector, document_vectors): pass class SimilarityRetrieval(RetrievalStrategy): def retrieve(self, query_vector, document_vectors): scores = [cosine_similarity(query_vector, doc_vec) for doc_vec in document_vectors] ranked_indices = np.argsort(scores)[::-1] return ranked_indices, [scores[i] for i in ranked_indices] class MMRRetrieval(RetrievalStrategy): def retrieve(self, query_vector, document_vectors, lambda_param=0.5, top_k=3): selected = [] candidates = list(range(len(document_vectors))) scores = [cosine_similarity(query_vector, doc_vec) for doc_vec in document_vectors] while len(selected) < top_k and candidates: mmr_scores = [] for candidate in candidates: sim_to_query = scores[candidate] sim_to_selected = 0 if selected: sim_to_selected = max(cosine_similarity(document_vectors[candidate], document_vectors[sel]) for sel in selected) mmr_score = lambda_param * sim_to_query - (1 - lambda_param) * sim_to_selected mmr_scores.append((mmr_score, candidate)) mmr_scores.sort(reverse=True) best = mmr_scores[0][1] selected.append(best) candidates.remove(best) return selected, [scores[i] for i in selected] class HybridRetrieval(RetrievalStrategy): def __init__(self, strategies): self.strategies = strategies def retrieve(self, query_vector, document_vectors): combined_scores = np.zeros(len(document_vectors)) for strategy in self.strategies: indices, scores = strategy.retrieve(query_vector, document_vectors) for idx, score in zip(indices, scores): combined_scores[idx] += score ranked_indices = np.argsort(combined_scores)[::-1] return ranked_indices, [combined_scores[i] for i in ranked_indices] # Sample data: 4 documents as vectors documents = [ np.array([1, 0, 0]), np.array([0, 1, 0]), np.array([0, 0, 1]), np.array([1, 1, 0]) ] query = np.array([1, 0.5, 0]) print('Similarity Retrieval:') sim_strategy = SimilarityRetrieval() indices, scores = sim_strategy.retrieve(query, documents) for i, score in zip(indices, scores): print(f'Doc {i} score: {score:.2f}') print('\nMMR Retrieval:') mmr_strategy = MMRRetrieval() indices, scores = mmr_strategy.retrieve(query, documents, lambda_param=0.7, top_k=3) for i, score in zip(indices, scores): print(f'Doc {i} score: {score:.2f}') print('\nHybrid Retrieval:') hybrid_strategy = HybridRetrieval([SimilarityRetrieval(), MMRRetrieval()]) indices, scores = hybrid_strategy.retrieve(query, documents) for i, score in zip(indices, scores): print(f'Doc {i} combined score: {score:.2f}')
Similarity retrieval is fast but may return very similar results.
MMR helps get diverse results by penalizing similar documents already selected.
Hybrid combines strengths of multiple strategies for better results.
Retrieval strategies find the best matching information for a query.
Similarity ranks by closeness, MMR balances relevance and diversity.
Hybrid methods combine strategies to improve search quality.
Practice
Solution
Step 1: Understand similarity-based retrieval
Similarity-based retrieval ranks results by how close or similar they are to the query, focusing only on relevance.Step 2: Compare with other strategies
MMR balances relevance and diversity, hybrid combines methods, and random is unrelated.Final Answer:
Similarity-based retrieval -> Option CQuick Check:
Similarity = closeness only [OK]
- Confusing MMR with similarity
- Thinking hybrid is only similarity
- Choosing random as a valid strategy
Solution
Step 1: Define MMR
MMR is designed to balance relevance to the query and diversity among the results to avoid redundancy.Step 2: Eliminate incorrect options
Random selection is unrelated, keyword matching is too narrow, and combining without weighting is not MMR.Final Answer:
Balances relevance and diversity in retrieval -> Option DQuick Check:
MMR = relevance + diversity [OK]
- Thinking MMR is random
- Assuming MMR uses only keywords
- Believing MMR combines methods blindly
results = []
for doc in documents:
sim_score = similarity(query, doc)
mmr_score = mmr(query, doc, results)
combined_score = 0.6 * sim_score + 0.4 * mmr_score
results.append((doc, combined_score))
results.sort(key=lambda x: x[1], reverse=True)
print([doc for doc, score in results[:3]])
What does this code output?Solution
Step 1: Analyze score calculation
The code calculates a combined score using 60% similarity and 40% MMR for each document.Step 2: Understand sorting and output
Documents are sorted by this combined score in descending order, then top 3 are printed.Final Answer:
Top 3 documents ranked by combined similarity and MMR scores -> Option AQuick Check:
Hybrid = combined scores [OK]
- Ignoring MMR score in combined score
- Assuming sorting by similarity only
- Thinking output is random
def mmr(query, docs, selected):
scores = []
for doc in docs:
relevance = similarity(query, doc)
diversity = min([similarity(doc, s) for s in selected])
score = relevance - 0.5 * diversity
scores.append((doc, score))
return max(scores, key=lambda x: x[1])[0]
What is the main error causing a crash when selected is empty?Solution
Step 1: Identify cause of crash
Whenselectedis empty, the list inside min() is empty, causing a ValueError.Step 2: Understand min() behavior
min() cannot operate on empty lists, so the code crashes at that line.Final Answer:
Using min() on an empty list causes an error -> Option AQuick Check:
min(empty list) = error [OK]
- Blaming max() instead of min()
- Ignoring empty list edge case
- Assuming similarity is undefined
Solution
Step 1: Understand the goal
Balancing relevance and diversity requires combining both similarity and MMR scores meaningfully.Step 2: Evaluate options
Using only similarity or zero diversity weight ignores diversity; random shuffling loses relevance order.Step 3: Best approach
Combining similarity and MMR with adjustable weights allows tuning the balance effectively.Final Answer:
Combine similarity and MMR scores with adjustable weights -> Option BQuick Check:
Hybrid weighted combination = best balance [OK]
- Ignoring diversity by using similarity only
- Setting diversity weight to zero in MMR
- Randomizing results without scoring
