0
0
LangChainframework~5 mins

Hybrid search (keyword + semantic) in LangChain

Choose your learning style9 modes available
Introduction

Hybrid search helps find information by combining exact word matches and understanding the meaning behind words. This makes search results more accurate and useful.

When you want to find documents that contain specific keywords and also relate to the meaning of your query.
When searching a large collection of text where exact matches alone miss relevant results.
When building a search feature that needs to understand user intent better than simple keyword search.
When you want to improve search quality by mixing fast keyword filters with smart semantic ranking.
When users expect both precise and context-aware search results in your app.
Syntax
LangChain
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI

# Create embeddings
embeddings = OpenAIEmbeddings()

# Load or create vector store
vectorstore = FAISS.load_local("index_path", embeddings)

# Define keyword search function (example with simple filter)
def keyword_search(docs, keyword):
    return [doc for doc in docs if keyword.lower() in doc.page_content.lower()]

# Combine keyword and semantic search
query = "your search query"
keyword_results = keyword_search(vectorstore.docstore._dict.values(), "keyword")
semantic_results = vectorstore.similarity_search(query, k=5)

# Merge and rank results (example: semantic results first, then keyword)
combined_results = semantic_results + [doc for doc in keyword_results if doc not in semantic_results]

This example shows how to combine keyword filtering with semantic similarity search using LangChain and FAISS.

You can customize keyword search logic and how to merge results based on your needs.

Examples
Find top 3 documents semantically related to "climate change effects".
LangChain
semantic_results = vectorstore.similarity_search("climate change effects", k=3)
Filter documents containing the word "energy" (case-insensitive).
LangChain
keyword_results = [doc for doc in docs if "energy" in doc.page_content.lower()]
Combine semantic and keyword results, avoiding duplicates.
LangChain
combined_results = semantic_results + [doc for doc in keyword_results if doc not in semantic_results]
Sample Program

This program creates a small set of documents, builds semantic embeddings, and performs both semantic and keyword searches. It then combines the results and prints them.

LangChain
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.schema import Document

# Sample documents
docs = [
    Document(page_content="Climate change impacts on polar bears."),
    Document(page_content="Renewable energy sources and benefits."),
    Document(page_content="Effects of global warming on oceans."),
    Document(page_content="Energy consumption trends in 2023."),
    Document(page_content="Polar bear habitats and climate."),
]

# Create embeddings
embeddings = OpenAIEmbeddings()

# Create vector store from documents
vectorstore = FAISS.from_documents(docs, embeddings)

# Define keyword search function
keyword = "energy"
def keyword_search(documents, keyword):
    return [doc for doc in documents if keyword.lower() in doc.page_content.lower()]

# Perform semantic search
query = "climate change"
semantic_results = vectorstore.similarity_search(query, k=3)

# Perform keyword search
keyword_results = keyword_search(docs, keyword)

# Combine results without duplicates
combined_results = semantic_results + [doc for doc in keyword_results if doc not in semantic_results]

# Print combined results
for i, doc in enumerate(combined_results, 1):
    print(f"Result {i}: {doc.page_content}")
OutputSuccess
Important Notes

Hybrid search improves search quality by balancing exact matches and meaning.

Keyword search is fast but can miss related ideas; semantic search understands meaning but may be slower.

Adjust how you combine results to fit your app's needs and user expectations.

Summary

Hybrid search mixes keyword and semantic search for better results.

Use keyword search to find exact words and semantic search to find related meanings.

Combining both helps users find what they want more easily.