Bird
Raised Fist0
Agentic AIml~5 mins

Retrieval strategies (similarity, MMR, hybrid) in Agentic AI

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Introduction

Retrieval strategies help find the most useful information from many options. They pick answers that best match what you want.

When searching for documents that closely match a question.
When you want diverse but relevant answers to avoid repetition.
When combining different methods to improve search results.
When building chatbots that need to find helpful facts quickly.
When filtering large data to show only the best matches.
Syntax
Agentic AI
class RetrievalStrategy:
    def retrieve(self, query, documents):
        pass

class SimilarityRetrieval(RetrievalStrategy):
    def retrieve(self, query, documents):
        # Return documents ranked by similarity to query
        pass

class MMRRetrieval(RetrievalStrategy):
    def retrieve(self, query, documents, lambda_param=0.5):
        # Return documents balancing similarity and diversity
        pass

class HybridRetrieval(RetrievalStrategy):
    def __init__(self, strategies):
        self.strategies = strategies
    def retrieve(self, query, documents):
        # Combine results from multiple strategies
        pass

Similarity retrieval ranks items by how close they are to the query.

MMR (Maximal Marginal Relevance) balances relevance and variety to avoid repeats.

Examples
Retrieve documents most similar to 'apple'.
Agentic AI
similarity_strategy = SimilarityRetrieval()
results = similarity_strategy.retrieve('apple', documents)
Retrieve documents balancing similarity and diversity with lambda 0.7.
Agentic AI
mmr_strategy = MMRRetrieval()
results = mmr_strategy.retrieve('apple', documents, lambda_param=0.7)
Combine similarity and MMR strategies to get better results.
Agentic AI
hybrid_strategy = HybridRetrieval([SimilarityRetrieval(), MMRRetrieval()])
results = hybrid_strategy.retrieve('apple', documents)
Sample Model

This program shows three retrieval strategies: similarity, MMR, and hybrid. It uses simple vectors to represent documents and a query. It prints the ranking and scores for each strategy.

Agentic AI
import numpy as np

def cosine_similarity(vec1, vec2):
    dot_product = np.dot(vec1, vec2)
    norm1 = np.linalg.norm(vec1)
    norm2 = np.linalg.norm(vec2)
    if norm1 == 0 or norm2 == 0:
        return 0.0
    return dot_product / (norm1 * norm2)

class RetrievalStrategy:
    def retrieve(self, query_vector, document_vectors):
        pass

class SimilarityRetrieval(RetrievalStrategy):
    def retrieve(self, query_vector, document_vectors):
        scores = [cosine_similarity(query_vector, doc_vec) for doc_vec in document_vectors]
        ranked_indices = np.argsort(scores)[::-1]
        return ranked_indices, [scores[i] for i in ranked_indices]

class MMRRetrieval(RetrievalStrategy):
    def retrieve(self, query_vector, document_vectors, lambda_param=0.5, top_k=3):
        selected = []
        candidates = list(range(len(document_vectors)))
        scores = [cosine_similarity(query_vector, doc_vec) for doc_vec in document_vectors]
        while len(selected) < top_k and candidates:
            mmr_scores = []
            for candidate in candidates:
                sim_to_query = scores[candidate]
                sim_to_selected = 0
                if selected:
                    sim_to_selected = max(cosine_similarity(document_vectors[candidate], document_vectors[sel]) for sel in selected)
                mmr_score = lambda_param * sim_to_query - (1 - lambda_param) * sim_to_selected
                mmr_scores.append((mmr_score, candidate))
            mmr_scores.sort(reverse=True)
            best = mmr_scores[0][1]
            selected.append(best)
            candidates.remove(best)
        return selected, [scores[i] for i in selected]

class HybridRetrieval(RetrievalStrategy):
    def __init__(self, strategies):
        self.strategies = strategies
    def retrieve(self, query_vector, document_vectors):
        combined_scores = np.zeros(len(document_vectors))
        for strategy in self.strategies:
            indices, scores = strategy.retrieve(query_vector, document_vectors)
            for idx, score in zip(indices, scores):
                combined_scores[idx] += score
        ranked_indices = np.argsort(combined_scores)[::-1]
        return ranked_indices, [combined_scores[i] for i in ranked_indices]

# Sample data: 4 documents as vectors
documents = [
    np.array([1, 0, 0]),
    np.array([0, 1, 0]),
    np.array([0, 0, 1]),
    np.array([1, 1, 0])
]

query = np.array([1, 0.5, 0])

print('Similarity Retrieval:')
sim_strategy = SimilarityRetrieval()
indices, scores = sim_strategy.retrieve(query, documents)
for i, score in zip(indices, scores):
    print(f'Doc {i} score: {score:.2f}')

print('\nMMR Retrieval:')
mmr_strategy = MMRRetrieval()
indices, scores = mmr_strategy.retrieve(query, documents, lambda_param=0.7, top_k=3)
for i, score in zip(indices, scores):
    print(f'Doc {i} score: {score:.2f}')

print('\nHybrid Retrieval:')
hybrid_strategy = HybridRetrieval([SimilarityRetrieval(), MMRRetrieval()])
indices, scores = hybrid_strategy.retrieve(query, documents)
for i, score in zip(indices, scores):
    print(f'Doc {i} combined score: {score:.2f}')
OutputSuccess
Important Notes

Similarity retrieval is fast but may return very similar results.

MMR helps get diverse results by penalizing similar documents already selected.

Hybrid combines strengths of multiple strategies for better results.

Summary

Retrieval strategies find the best matching information for a query.

Similarity ranks by closeness, MMR balances relevance and diversity.

Hybrid methods combine strategies to improve search quality.

Practice

(1/5)
1. Which retrieval strategy focuses on ranking results purely based on how close they are to the query?
easy
A. Random retrieval
B. Maximal Marginal Relevance (MMR)
C. Similarity-based retrieval
D. Hybrid retrieval

Solution

  1. Step 1: Understand similarity-based retrieval

    Similarity-based retrieval ranks results by how close or similar they are to the query, focusing only on relevance.
  2. Step 2: Compare with other strategies

    MMR balances relevance and diversity, hybrid combines methods, and random is unrelated.
  3. Final Answer:

    Similarity-based retrieval -> Option C
  4. Quick Check:

    Similarity = closeness only [OK]
Hint: Similarity means closest match only [OK]
Common Mistakes:
  • Confusing MMR with similarity
  • Thinking hybrid is only similarity
  • Choosing random as a valid strategy
2. Which of the following is the correct way to describe Maximal Marginal Relevance (MMR)?
easy
A. Combines all retrieval methods without weighting
B. Ranks results by random selection
C. Only uses keyword matching
D. Balances relevance and diversity in retrieval

Solution

  1. Step 1: Define MMR

    MMR is designed to balance relevance to the query and diversity among the results to avoid redundancy.
  2. Step 2: Eliminate incorrect options

    Random selection is unrelated, keyword matching is too narrow, and combining without weighting is not MMR.
  3. Final Answer:

    Balances relevance and diversity in retrieval -> Option D
  4. Quick Check:

    MMR = relevance + diversity [OK]
Hint: MMR mixes relevance with diversity [OK]
Common Mistakes:
  • Thinking MMR is random
  • Assuming MMR uses only keywords
  • Believing MMR combines methods blindly
3. Given the following pseudo-code for a hybrid retrieval method combining similarity and MMR scores:
results = []
for doc in documents:
    sim_score = similarity(query, doc)
    mmr_score = mmr(query, doc, results)
    combined_score = 0.6 * sim_score + 0.4 * mmr_score
    results.append((doc, combined_score))
results.sort(key=lambda x: x[1], reverse=True)
print([doc for doc, score in results[:3]])
What does this code output?
medium
A. Top 3 documents ranked by combined similarity and MMR scores
B. Top 3 documents ranked by similarity score only
C. Top 3 documents ranked by MMR score only
D. Random 3 documents from the list

Solution

  1. Step 1: Analyze score calculation

    The code calculates a combined score using 60% similarity and 40% MMR for each document.
  2. Step 2: Understand sorting and output

    Documents are sorted by this combined score in descending order, then top 3 are printed.
  3. Final Answer:

    Top 3 documents ranked by combined similarity and MMR scores -> Option A
  4. Quick Check:

    Hybrid = combined scores [OK]
Hint: Check weighted sum and sorting for final ranking [OK]
Common Mistakes:
  • Ignoring MMR score in combined score
  • Assuming sorting by similarity only
  • Thinking output is random
4. Consider this buggy code snippet for MMR retrieval:
def mmr(query, docs, selected):
    scores = []
    for doc in docs:
        relevance = similarity(query, doc)
        diversity = min([similarity(doc, s) for s in selected])
        score = relevance - 0.5 * diversity
        scores.append((doc, score))
    return max(scores, key=lambda x: x[1])[0]
What is the main error causing a crash when selected is empty?
medium
A. Using min() on an empty list causes an error
B. Incorrect use of max() function
C. Missing return statement
D. Similarity function is undefined

Solution

  1. Step 1: Identify cause of crash

    When selected is empty, the list inside min() is empty, causing a ValueError.
  2. Step 2: Understand min() behavior

    min() cannot operate on empty lists, so the code crashes at that line.
  3. Final Answer:

    Using min() on an empty list causes an error -> Option A
  4. Quick Check:

    min(empty list) = error [OK]
Hint: Check min() on empty lists for errors [OK]
Common Mistakes:
  • Blaming max() instead of min()
  • Ignoring empty list edge case
  • Assuming similarity is undefined
5. You want to improve a search system by combining similarity and MMR retrieval. Which approach best balances relevance and diversity in the final results?
hard
A. Use MMR with a diversity weight of zero
B. Combine similarity and MMR scores with adjustable weights
C. Use only similarity scores to rank results
D. Randomly shuffle results after similarity ranking

Solution

  1. Step 1: Understand the goal

    Balancing relevance and diversity requires combining both similarity and MMR scores meaningfully.
  2. Step 2: Evaluate options

    Using only similarity or zero diversity weight ignores diversity; random shuffling loses relevance order.
  3. Step 3: Best approach

    Combining similarity and MMR with adjustable weights allows tuning the balance effectively.
  4. Final Answer:

    Combine similarity and MMR scores with adjustable weights -> Option B
  5. Quick Check:

    Hybrid weighted combination = best balance [OK]
Hint: Adjust weights to balance relevance and diversity [OK]
Common Mistakes:
  • Ignoring diversity by using similarity only
  • Setting diversity weight to zero in MMR
  • Randomizing results without scoring