0
0
Agentic AIml~20 mins

Retrieval strategies (similarity, MMR, hybrid) in Agentic AI - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Retrieval strategies (similarity, MMR, hybrid)
Problem:You have a document search system that returns results based on similarity scores. Currently, it returns many similar documents that are redundant. This reduces the usefulness of the search results.
Current Metrics:Precision@5: 80%, Diversity score: 0.3 (low diversity)
Issue:The retrieval strategy overemphasizes similarity, causing redundant results and low diversity.
Your Task
Improve the retrieval strategy to increase diversity of top 5 results while maintaining precision above 75%.
You can only modify the retrieval ranking method.
You cannot change the underlying document embeddings or similarity calculation.
You must keep the system efficient for real-time queries.
Hint 1
Hint 2
Hint 3
Solution
Agentic AI
import numpy as np

def mmr(doc_embeddings, query_embedding, lambda_param=0.5, top_k=5):
    # Calculate similarity between query and documents
    sim_to_query = np.dot(doc_embeddings, query_embedding) / (
        np.linalg.norm(doc_embeddings, axis=1) * np.linalg.norm(query_embedding) + 1e-10)

    selected = []
    candidates = list(range(len(doc_embeddings)))

    while len(selected) < top_k and candidates:
        mmr_scores = []
        for idx in candidates:
            if not selected:
                diversity = 0
            else:
                sim_to_selected = max(
                    np.dot(doc_embeddings[idx], doc_embeddings[s]) /
                    (np.linalg.norm(doc_embeddings[idx]) * np.linalg.norm(doc_embeddings[s]) + 1e-10)
                    for s in selected
                )
                diversity = sim_to_selected
            score = lambda_param * sim_to_query[idx] - (1 - lambda_param) * diversity
            mmr_scores.append((score, idx))

        mmr_scores.sort(reverse=True)
        best_score, best_idx = mmr_scores[0]
        selected.append(best_idx)
        candidates.remove(best_idx)

    return selected

# Example usage:
# doc_embeddings: numpy array of shape (num_docs, embedding_dim)
# query_embedding: numpy array of shape (embedding_dim,)

# Assume doc_embeddings and query_embedding are given

# selected_indices = mmr(doc_embeddings, query_embedding, lambda_param=0.7, top_k=5)

# This returns indices of documents balancing similarity and diversity.
Implemented Maximal Marginal Relevance (MMR) to select documents.
Added a lambda parameter to balance similarity to query and diversity among selected documents.
Replaced pure similarity ranking with MMR-based ranking to reduce redundancy.
Results Interpretation

Before: Precision@5 = 80%, Diversity = 0.3 (low diversity, many similar results)

After: Precision@5 = 78%, Diversity = 0.65 (higher diversity, less redundancy)

Using MMR helps balance relevance and diversity in retrieval results, reducing redundancy while keeping precision high.
Bonus Experiment
Try a hybrid retrieval strategy that first filters top 10 documents by similarity, then applies MMR to select the final 5 results.
💡 Hint
This can improve efficiency by reducing the candidate set before applying the more complex MMR ranking.