0
0
LangChainframework~10 mins

Hybrid search (keyword + semantic) in LangChain - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Hybrid search (keyword + semantic)
User Query Input
Keyword Search
Semantic Search
Combine Results
Rank & Return Final Results
The user query is processed by both keyword and semantic search, then results are combined and ranked before showing.
Execution Sample
LangChain
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.retrievers import BM25Retriever

# Assume bm25 (BM25Retriever) and vectorstore (FAISS) are initialized
# Perform keyword and semantic search
results_keyword = bm25.get_relevant_documents(query)
results_semantic = vectorstore.similarity_search(query, k=3)
This code runs keyword and semantic searches on the same query and collects top 3 results from each.
Execution Table
StepActionInputOutputNotes
1Receive user query"climate change impact"Query storedStart with raw user input
2Run keyword search"climate change impact"Top 3 docs matching keywordsFind docs with exact words
3Run semantic search"climate change impact"Top 3 docs by meaningFind docs with similar meaning
4Combine resultsKeyword + Semantic results6 docs combinedMerge both result sets
5Rank combined results6 docsFinal ranked listSort by relevance score
6Return resultsFinal ranked listDisplayed to userShow best matches
7ExitN/ASearch completeProcess ends
💡 All steps complete, final ranked results returned to user
Variable Tracker
VariableStartAfter Step 2After Step 3After Step 4After Step 5Final
query"climate change impact""climate change impact""climate change impact""climate change impact""climate change impact""climate change impact"
results_keywordNone[Doc1, Doc2, Doc3][Doc1, Doc2, Doc3][Doc1, Doc2, Doc3][Doc1, Doc2, Doc3][Doc1, Doc2, Doc3]
results_semanticNoneNone[Doc4, Doc5, Doc6][Doc4, Doc5, Doc6][Doc4, Doc5, Doc6][Doc4, Doc5, Doc6]
combined_resultsNoneNoneNone[Doc1, Doc2, Doc3, Doc4, Doc5, Doc6][Doc1, Doc2, Doc3, Doc4, Doc5, Doc6][Doc1, Doc2, Doc3, Doc4, Doc5, Doc6]
final_resultsNoneNoneNoneNone[Doc2, Doc4, Doc1, Doc5, Doc3, Doc6][Doc2, Doc4, Doc1, Doc5, Doc3, Doc6]
Key Moments - 3 Insights
Why do we run both keyword and semantic searches instead of just one?
Because keyword search finds exact word matches (see Step 2 in execution_table), while semantic search finds documents with similar meaning even if words differ (Step 3). Combining both gives better coverage.
How are the combined results ranked before returning?
After merging keyword and semantic results (Step 4), they are ranked by relevance scores that consider both keyword matches and semantic similarity (Step 5). This ensures the best overall matches appear first.
What happens if the same document appears in both keyword and semantic results?
Duplicates are removed during combination (Step 4), so each document appears only once in the final ranked list.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, what is the output after Step 3?
ATop 3 docs matching keywords
BTop 3 docs by meaning
CFinal ranked list
DCombined 6 docs
💡 Hint
Check the Output column for Step 3 in execution_table
At which step are the keyword and semantic results combined?
AStep 4
BStep 3
CStep 2
DStep 5
💡 Hint
Look for the action 'Combine results' in execution_table
If the query changes, which variable in variable_tracker updates first?
Aresults_keyword
Bcombined_results
Cquery
Dfinal_results
💡 Hint
See variable_tracker, 'query' is set at Start and remains constant
Concept Snapshot
Hybrid search combines keyword and semantic search.
Keyword search finds exact word matches.
Semantic search finds meaning-based matches.
Results are merged and ranked by relevance.
This approach improves search accuracy and recall.
Full Transcript
Hybrid search in Langchain means using both keyword and semantic search on the same user query. First, the query is taken as input. Then keyword search finds documents containing the exact words. Semantic search finds documents with similar meaning even if words differ. Both result sets are combined and duplicates removed. The combined list is ranked by relevance scores considering both methods. Finally, the best results are returned to the user. This method improves finding relevant documents by covering both exact matches and related meanings.