LangChainframework~30 mins

Hybrid search (keyword + semantic) in LangChain - Mini Project: Build & Apply

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Perf

Hybrid Search with Langchain: Keyword + Semantic

📖 Scenario: You are building a smart search feature for a small document collection. Users want to find documents by typing keywords or by meaning (semantic search). Combining both methods gives better results.

🎯 Goal: Create a hybrid search system using Langchain that first filters documents by keyword, then ranks them by semantic similarity.

📋 What You'll Learn

Create a list of documents with exact text content

Define a keyword to filter documents

Use Langchain's embedding model to get semantic vectors

Combine keyword filtering and semantic similarity ranking

Return the top matching documents

💡 Why This Matters

🌍 Real World

Hybrid search is used in apps like document search, customer support, and knowledge bases to find relevant info quickly by combining exact word matches and meaning.

💼 Career

Understanding hybrid search with Langchain is valuable for roles in AI, data science, and software development focused on search engines and natural language processing.

Progress0 / 4 steps

Create the document list

Create a list called documents with these exact strings: 'Langchain is a framework for building applications', 'Semantic search finds meaning', 'Keyword search matches exact words', 'Hybrid search combines both methods'.

LangChain

# Create the list of documents
# Your code here

Need a hint?

Use a Python list with the exact strings given.

Set the keyword filter

Create a variable called keyword and set it to the string 'search' to filter documents containing this word.

LangChain

documents = [
    'Langchain is a framework for building applications',
    'Semantic search finds meaning',
    'Keyword search matches exact words',
    'Hybrid search combines both methods'
]

# Set the keyword to filter documents
# Your code here

Need a hint?

Assign the exact string 'search' to the variable keyword.

Filter documents by keyword and embed

Import OpenAIEmbeddings from langchain.embeddings. Create an instance called embedding_model. Filter documents to keep only those containing keyword (case-insensitive) into filtered_docs. Then create a list embedded_docs by applying embedding_model.embed_query() to each document in filtered_docs.

LangChain

documents = [
    'Langchain is a framework for building applications',
    'Semantic search finds meaning',
    'Keyword search matches exact words',
    'Hybrid search combines both methods'
]

keyword = 'search'

# Import OpenAIEmbeddings and create embedding_model
# Filter documents by keyword
# Embed filtered documents
# Your code here

Need a hint?

Use list comprehension to filter and embed documents. Remember to import the embedding class first.

Rank filtered documents by semantic similarity

Import cosine_similarity from sklearn.metrics.pairwise. Create a variable query_embedding by embedding the string 'hybrid search' using embedding_model.embed_query(). Compute a list similarities by calculating cosine similarity between query_embedding and each vector in embedded_docs. Create a list ranked_docs by sorting filtered_docs in descending order of similarity using similarities. The final code should combine keyword filtering and semantic ranking.

LangChain

from langchain.embeddings import OpenAIEmbeddings
from sklearn.metrics.pairwise import cosine_similarity

# Embed the query 'hybrid search'
# Calculate cosine similarities
# Sort filtered_docs by similarity descending
# Your code here

Need a hint?

Use cosine similarity to rank documents by meaning. Sort documents by similarity descending.