Overview - Similarity search vs MMR retrieval

What is it?

Similarity search and MMR retrieval are two methods used to find relevant information from a collection of documents or data points. Similarity search finds items most like a query based on closeness in meaning or features. MMR retrieval (Maximal Marginal Relevance) balances relevance with diversity to avoid repetitive or overly similar results. Both help computers pick useful answers from large data sets.

Why it matters

Without these methods, searching large collections would return either too many irrelevant results or many very similar ones, making it hard to find useful information quickly. They help improve search quality in chatbots, recommendation systems, and knowledge bases, making interactions smarter and more helpful. This saves time and frustration for users.

Where it fits

Learners should first understand basic vector representations and embeddings, which turn text into numbers for comparison. After mastering similarity search, they can explore MMR retrieval to improve result diversity. Later, they can study advanced retrieval techniques and how these fit into full language model pipelines.

Mental Model

Core Idea

Similarity search finds the closest matches to a query, while MMR retrieval finds relevant but diverse results to avoid repetition.

Think of it like...

Imagine picking fruits from a basket: similarity search picks the fruits that look most like the one you want, while MMR retrieval picks fruits that are both good matches and different from each other, so you get variety.

Query
  │
  ▼
[Similarity Search]───> Returns top N closest items by similarity
  │
  ▼
[MMR Retrieval]──────> Returns top N items balancing similarity and diversity

Similarity Search: Focus on closeness
MMR Retrieval: Balance closeness + variety

Build-Up - 7 Steps

1

FoundationUnderstanding Vector Similarity Search

Concept: Learn how similarity search uses vector math to find closest matches.

Texts or documents are converted into vectors (lists of numbers) using embeddings. Similarity search compares these vectors to a query vector using measures like cosine similarity. The items with the highest similarity scores are returned as the closest matches.

Result

You get a list of items ranked by how close their meaning is to your query.

Understanding vector similarity is key because it turns complex text meaning into simple math, enabling fast and effective search.

2

FoundationBasics of Maximal Marginal Relevance (MMR)

3

IntermediateComparing Similarity Search and MMR Retrieval

4

IntermediateImplementing Similarity Search in Langchain

5

IntermediateApplying MMR Retrieval in Langchain

6

AdvancedTuning MMR Parameters for Best Results

7

ExpertSurprising Effects of MMR on Large Datasets

Under the Hood

Similarity search works by converting text into vectors in a high-dimensional space and measuring distances or angles between these vectors to find the closest matches. MMR retrieval builds on this by iteratively selecting documents that maximize a combined score of similarity to the query and dissimilarity to already chosen documents, using a formula that balances these two factors.

Why designed this way?

Similarity search was designed to quickly find the most relevant items using simple math, making it efficient and effective for many tasks. MMR was introduced to solve the problem of redundant results by adding a diversity component, improving user experience in exploratory search and recommendation. Alternatives like clustering or re-ranking existed but were less flexible or efficient.

Query Vector
   │
   ▼
[Vector Store]───> Compute similarity scores
   │                 │
   ▼                 ▼
Similarity Search   MMR Retrieval
   │                 │
   ▼                 ▼
Top N closest     Iterative selection balancing
items by similarity relevance + diversity
   │                 │
   ▼                 ▼
Returned list    Returned diverse list

Myth Busters - 4 Common Misconceptions

Quick: Does MMR retrieval always return less relevant results than similarity search? Commit to yes or no.

Common Belief:MMR retrieval sacrifices too much relevance for diversity, so it returns worse results.

Tap to reveal reality

Quick: Is similarity search always faster than MMR retrieval? Commit to yes or no.

Common Belief:Similarity search is always faster because it just ranks by closeness once.

Tap to reveal reality

Quick: Does similarity search guarantee finding all relevant documents? Commit to yes or no.

Common Belief:Similarity search always finds all relevant documents because it ranks by closeness.

Tap to reveal reality

Quick: Can MMR retrieval be used without vector embeddings? Commit to yes or no.

Common Belief:MMR retrieval requires vector embeddings to compute similarity and diversity.

Tap to reveal reality

Expert Zone

1

MMR's effectiveness depends heavily on the quality and dimensionality of embeddings; poor embeddings reduce both relevance and diversity benefits.

2

The choice of similarity metric (cosine, Euclidean, dot product) affects both similarity search and MMR results subtly but significantly.

3

In some cases, combining MMR with user feedback loops can dynamically adjust diversity preferences, improving personalization.

When NOT to use

Avoid MMR retrieval when you need the absolute top matches without any diversity, such as exact fact retrieval or legal document search. Instead, use pure similarity search or exact keyword matching. Also, for very large datasets with strict latency requirements, approximate nearest neighbor search without MMR may be preferable.

Production Patterns

In production, similarity search is often the first step to quickly narrow candidates, followed by MMR re-ranking to diversify results. Hybrid pipelines combine vector search with metadata filters and MMR to balance speed, relevance, and diversity. Monitoring user interactions helps tune MMR parameters over time.

Connections

Recommendation Systems

MMR retrieval is similar to diversification techniques used in recommendations to avoid showing users repetitive items.

Understanding MMR helps grasp how recommendation engines balance relevance and novelty to keep users engaged.

Information Retrieval

Similarity search is a core technique in information retrieval for ranking documents by relevance.

Knowing similarity search deepens understanding of classic search engines and modern semantic search.

Portfolio Optimization (Finance)

MMR's balance of relevance and diversity parallels how portfolios balance expected return and risk through diversification.

Recognizing this connection shows how balancing competing goals is a common problem across fields.

Common Pitfalls

#1Returning many very similar documents that overwhelm the user.

Wrong approach:results = vector_store.similarity_search(query, k=10)

Correct approach:results = vector_store.mmr_search(query, k=10, lambda_param=0.5)

Root cause:Using pure similarity search without diversity leads to redundant results.

#2Setting MMR diversity parameter too high, causing irrelevant results.

Wrong approach:results = vector_store.mmr_search(query, k=10, lambda_param=0.1)

Correct approach:results = vector_store.mmr_search(query, k=10, lambda_param=0.7)

Root cause:Misunderstanding the lambda parameter causes overemphasis on diversity at the cost of relevance.

#3Assuming similarity search always finds all relevant documents.

Wrong approach:results = vector_store.similarity_search(query, k=5) # No fallback or expansion

Correct approach:results = vector_store.similarity_search(query, k=5) if len(results) < desired: results += vector_store.similarity_search(expanded_query, k=5)

Root cause:Ignoring embedding limitations and query ambiguity leads to missed relevant results.

Key Takeaways

Similarity search finds the closest matches to a query using vector comparisons, making it fast and effective for relevance.

MMR retrieval improves search results by balancing relevance with diversity, reducing repetitive answers.

Choosing between similarity search and MMR depends on whether you want focused or varied results for your users.

Tuning MMR parameters is essential to get the right balance and avoid poor user experiences.

Understanding the internal workings and trade-offs of these methods helps build smarter, more user-friendly search systems.