0
0
Agentic AIml~5 mins

Embedding models for semantic search in Agentic AI

Choose your learning style9 modes available
Introduction

Embedding models turn words or sentences into numbers that computers can understand. This helps find similar meanings in text, even if the exact words are different.

When you want to find documents that mean the same thing as a search query.
When you need to group similar customer reviews or feedback.
When building chatbots that understand user questions better.
When organizing large collections of articles by topic.
When matching job descriptions with candidate resumes.
Syntax
Agentic AI
embedding = model.encode(texts)
# texts is a list of sentences or documents
# embedding is a list of number arrays representing each text

The model.encode() function converts text into vectors (lists of numbers).

These vectors capture the meaning of the text, not just the words.

Examples
This creates embeddings for two sentences to compare their meanings.
Agentic AI
embedding = model.encode(["I love apples", "Apples are tasty"])
Embedding a single search query to find similar texts.
Agentic AI
query_embedding = model.encode([query])
# query is a single search sentence
Embedding many documents efficiently by processing in batches.
Agentic AI
embeddings = model.encode(documents, batch_size=32)
Sample Model

This program uses an embedding model to find which document best matches a search query by meaning.

Agentic AI
from sentence_transformers import SentenceTransformer, util

# Load a pre-trained embedding model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Sample documents
documents = [
    "Machine learning helps computers learn from data.",
    "Artificial intelligence is a broad field.",
    "Deep learning is a part of machine learning.",
    "I love reading about AI advancements."
]

# Create embeddings for documents
doc_embeddings = model.encode(documents, convert_to_tensor=True)

# Query to search
query = "What is machine learning?"
query_embedding = model.encode([query], convert_to_tensor=True)

# Find the most similar document
hits = util.semantic_search(query_embedding, doc_embeddings, top_k=1)

# Get index of best match
best_match_idx = hits[0][0]['corpus_id']

print(f"Query: {query}")
print(f"Best matching document: {documents[best_match_idx]}")
OutputSuccess
Important Notes

Embedding models work well even if the words in the query and documents are different but the meaning is similar.

Using pre-trained models saves time and works well for many languages and topics.

Embedding vectors can be compared using cosine similarity to find how close meanings are.

Summary

Embedding models convert text into numbers that capture meaning.

They help find similar texts even if words differ.

Useful for search, grouping, and understanding text better.