LangChainframework~5 mins

Why embeddings capture semantic meaning in LangChain

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Perf

Introduction

Embeddings turn words or sentences into numbers that show their meaning. This helps computers understand and compare ideas like people do.

When you want to find similar documents or texts quickly.

When building a search tool that understands what you mean, not just exact words.

When grouping or organizing texts by their topics or ideas.

When you want to match questions with the best answers automatically.

When creating chatbots that understand context better.

Syntax

LangChain

embedding = model.embed_query(text)
# embedding is a list or array of numbers representing the text meaning

Embeddings are usually lists of numbers called vectors.

Texts with similar meanings have embeddings that are close together in number space.

Examples

This creates a number vector for the sentence 'I love sunny days'.

LangChain

embedding = model.embed_query('I love sunny days')

Two embeddings for different sentences can be compared to see how similar their meanings are.

LangChain

embedding1 = model.embed_query('Cats are cute')
embedding2 = model.embed_query('Dogs are adorable')

This calculates how close two embeddings are, showing how related the sentences are in meaning.

LangChain

similarity = cosine_similarity([embedding1], [embedding2])

Sample Program

This program shows how embeddings turn sentences into numbers and then compares them to find how similar their meanings are. A higher score means the sentences are closer in meaning.

LangChain

from langchain.embeddings import OpenAIEmbeddings
from sklearn.metrics.pairwise import cosine_similarity

# Initialize the embedding model
model = OpenAIEmbeddings()

# Two example texts
text1 = 'I enjoy walking in the park'
text2 = 'Strolling through the garden is relaxing'

# Get embeddings
embedding1 = model.embed_query(text1)
embedding2 = model.embed_query(text2)

# Calculate similarity
similarity_score = cosine_similarity([embedding1], [embedding2])[0][0]

print(f'Similarity score: {similarity_score:.2f}')

OutputSuccess

Important Notes

Embeddings capture meaning by looking at how words appear together in many texts.

They help computers understand language beyond just matching exact words.

Similarity scores usually range from -1 (opposite) to 1 (very similar).

Summary

Embeddings convert text into number vectors that show meaning.

Texts with similar ideas have embeddings close to each other.

Comparing embeddings helps find related or similar content easily.