Embedding models turn words or sentences into numbers that computers can understand. This helps find similar meanings in text, even if the exact words are different.
0
0
Embedding models for semantic search in Agentic AI
Introduction
When you want to find documents that mean the same thing as a search query.
When you need to group similar customer reviews or feedback.
When building chatbots that understand user questions better.
When organizing large collections of articles by topic.
When matching job descriptions with candidate resumes.
Syntax
Agentic AI
embedding = model.encode(texts) # texts is a list of sentences or documents # embedding is a list of number arrays representing each text
The model.encode() function converts text into vectors (lists of numbers).
These vectors capture the meaning of the text, not just the words.
Examples
This creates embeddings for two sentences to compare their meanings.
Agentic AI
embedding = model.encode(["I love apples", "Apples are tasty"])
Embedding a single search query to find similar texts.
Agentic AI
query_embedding = model.encode([query])
# query is a single search sentenceEmbedding many documents efficiently by processing in batches.
Agentic AI
embeddings = model.encode(documents, batch_size=32)Sample Model
This program uses an embedding model to find which document best matches a search query by meaning.
Agentic AI
from sentence_transformers import SentenceTransformer, util # Load a pre-trained embedding model model = SentenceTransformer('all-MiniLM-L6-v2') # Sample documents documents = [ "Machine learning helps computers learn from data.", "Artificial intelligence is a broad field.", "Deep learning is a part of machine learning.", "I love reading about AI advancements." ] # Create embeddings for documents doc_embeddings = model.encode(documents, convert_to_tensor=True) # Query to search query = "What is machine learning?" query_embedding = model.encode([query], convert_to_tensor=True) # Find the most similar document hits = util.semantic_search(query_embedding, doc_embeddings, top_k=1) # Get index of best match best_match_idx = hits[0][0]['corpus_id'] print(f"Query: {query}") print(f"Best matching document: {documents[best_match_idx]}")
OutputSuccess
Important Notes
Embedding models work well even if the words in the query and documents are different but the meaning is similar.
Using pre-trained models saves time and works well for many languages and topics.
Embedding vectors can be compared using cosine similarity to find how close meanings are.
Summary
Embedding models convert text into numbers that capture meaning.
They help find similar texts even if words differ.
Useful for search, grouping, and understanding text better.