Retrieval strategies help find the most useful information from many options. They pick answers that best match what you want.
0
0
Retrieval strategies (similarity, MMR, hybrid) in Agentic AI
Introduction
When searching for documents that closely match a question.
When you want diverse but relevant answers to avoid repetition.
When combining different methods to improve search results.
When building chatbots that need to find helpful facts quickly.
When filtering large data to show only the best matches.
Syntax
Agentic AI
class RetrievalStrategy: def retrieve(self, query, documents): pass class SimilarityRetrieval(RetrievalStrategy): def retrieve(self, query, documents): # Return documents ranked by similarity to query pass class MMRRetrieval(RetrievalStrategy): def retrieve(self, query, documents, lambda_param=0.5): # Return documents balancing similarity and diversity pass class HybridRetrieval(RetrievalStrategy): def __init__(self, strategies): self.strategies = strategies def retrieve(self, query, documents): # Combine results from multiple strategies pass
Similarity retrieval ranks items by how close they are to the query.
MMR (Maximal Marginal Relevance) balances relevance and variety to avoid repeats.
Examples
Retrieve documents most similar to 'apple'.
Agentic AI
similarity_strategy = SimilarityRetrieval()
results = similarity_strategy.retrieve('apple', documents)Retrieve documents balancing similarity and diversity with lambda 0.7.
Agentic AI
mmr_strategy = MMRRetrieval() results = mmr_strategy.retrieve('apple', documents, lambda_param=0.7)
Combine similarity and MMR strategies to get better results.
Agentic AI
hybrid_strategy = HybridRetrieval([SimilarityRetrieval(), MMRRetrieval()])
results = hybrid_strategy.retrieve('apple', documents)Sample Model
This program shows three retrieval strategies: similarity, MMR, and hybrid. It uses simple vectors to represent documents and a query. It prints the ranking and scores for each strategy.
Agentic AI
import numpy as np def cosine_similarity(vec1, vec2): dot_product = np.dot(vec1, vec2) norm1 = np.linalg.norm(vec1) norm2 = np.linalg.norm(vec2) if norm1 == 0 or norm2 == 0: return 0.0 return dot_product / (norm1 * norm2) class RetrievalStrategy: def retrieve(self, query_vector, document_vectors): pass class SimilarityRetrieval(RetrievalStrategy): def retrieve(self, query_vector, document_vectors): scores = [cosine_similarity(query_vector, doc_vec) for doc_vec in document_vectors] ranked_indices = np.argsort(scores)[::-1] return ranked_indices, [scores[i] for i in ranked_indices] class MMRRetrieval(RetrievalStrategy): def retrieve(self, query_vector, document_vectors, lambda_param=0.5, top_k=3): selected = [] candidates = list(range(len(document_vectors))) scores = [cosine_similarity(query_vector, doc_vec) for doc_vec in document_vectors] while len(selected) < top_k and candidates: mmr_scores = [] for candidate in candidates: sim_to_query = scores[candidate] sim_to_selected = 0 if selected: sim_to_selected = max(cosine_similarity(document_vectors[candidate], document_vectors[sel]) for sel in selected) mmr_score = lambda_param * sim_to_query - (1 - lambda_param) * sim_to_selected mmr_scores.append((mmr_score, candidate)) mmr_scores.sort(reverse=True) best = mmr_scores[0][1] selected.append(best) candidates.remove(best) return selected, [scores[i] for i in selected] class HybridRetrieval(RetrievalStrategy): def __init__(self, strategies): self.strategies = strategies def retrieve(self, query_vector, document_vectors): combined_scores = np.zeros(len(document_vectors)) for strategy in self.strategies: indices, scores = strategy.retrieve(query_vector, document_vectors) for idx, score in zip(indices, scores): combined_scores[idx] += score ranked_indices = np.argsort(combined_scores)[::-1] return ranked_indices, [combined_scores[i] for i in ranked_indices] # Sample data: 4 documents as vectors documents = [ np.array([1, 0, 0]), np.array([0, 1, 0]), np.array([0, 0, 1]), np.array([1, 1, 0]) ] query = np.array([1, 0.5, 0]) print('Similarity Retrieval:') sim_strategy = SimilarityRetrieval() indices, scores = sim_strategy.retrieve(query, documents) for i, score in zip(indices, scores): print(f'Doc {i} score: {score:.2f}') print('\nMMR Retrieval:') mmr_strategy = MMRRetrieval() indices, scores = mmr_strategy.retrieve(query, documents, lambda_param=0.7, top_k=3) for i, score in zip(indices, scores): print(f'Doc {i} score: {score:.2f}') print('\nHybrid Retrieval:') hybrid_strategy = HybridRetrieval([SimilarityRetrieval(), MMRRetrieval()]) indices, scores = hybrid_strategy.retrieve(query, documents) for i, score in zip(indices, scores): print(f'Doc {i} combined score: {score:.2f}')
OutputSuccess
Important Notes
Similarity retrieval is fast but may return very similar results.
MMR helps get diverse results by penalizing similar documents already selected.
Hybrid combines strengths of multiple strategies for better results.
Summary
Retrieval strategies find the best matching information for a query.
Similarity ranks by closeness, MMR balances relevance and diversity.
Hybrid methods combine strategies to improve search quality.