0
0
LangChainframework~5 mins

Metadata filtering in vector stores in LangChain

Choose your learning style9 modes available
Introduction

Metadata filtering helps you find specific items in a vector store by using extra information about those items. It makes searching faster and more accurate.

You want to search documents but only from a certain date or author.
You have many items and want to narrow results by category or tag.
You want to exclude some items from search results based on their properties.
You want to combine text similarity with specific conditions like language or type.
Syntax
LangChain
results = vector_store.similarity_search(query, filter={"key": "value"})
The filter is a dictionary with metadata keys and values to match.
Only items with metadata matching the filter are returned.
Examples
Search for items about 'climate change' but only from the year 2023.
LangChain
results = vector_store.similarity_search("climate change", filter={"year": "2023"})
Find dessert recipes that are easy to make.
LangChain
results = vector_store.similarity_search("recipe", filter={"category": "dessert", "difficulty": "easy"})
Search machine learning documents only in English.
LangChain
results = vector_store.similarity_search("machine learning", filter={"language": "English"})
Sample Program

This example creates a vector store with three documents, each having metadata about topic and level. It searches for documents related to 'Python' but only those tagged as programming and beginner level. The output shows matching document texts.

LangChain
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings

# Sample documents with metadata
documents = [
    {"text": "Learn Python basics.", "metadata": {"topic": "programming", "level": "beginner"}},
    {"text": "Advanced Python techniques.", "metadata": {"topic": "programming", "level": "advanced"}},
    {"text": "Cooking pasta recipes.", "metadata": {"topic": "cooking", "level": "beginner"}}
]

# Create embeddings
embeddings = OpenAIEmbeddings()

# Prepare texts and metadata
texts = [doc["text"] for doc in documents]
metadatas = [doc["metadata"] for doc in documents]

# Create vector store with metadata
vector_store = FAISS.from_texts(texts, embeddings, metadatas=metadatas)

# Search for programming documents at beginner level
results = vector_store.similarity_search("Python", filter={"topic": "programming", "level": "beginner"})

for r in results:
    print(r.page_content)
OutputSuccess
Important Notes

Filters must match metadata keys exactly as stored.

Not all vector stores support metadata filtering; check your store's documentation.

Filtering helps reduce noise and improves search relevance.

Summary

Metadata filtering narrows search results by extra information.

Use a dictionary with key-value pairs to filter results.

It works well when you want specific categories or properties in your search.