0
0
NLPml~5 mins

Semantic similarity with embeddings in NLP

Choose your learning style9 modes available
Introduction
Semantic similarity helps us find how close two pieces of text are in meaning, even if they use different words.
Finding if two sentences mean the same thing in a chatbot.
Grouping similar customer reviews together.
Recommending articles that talk about similar topics.
Checking if a question is already answered in a FAQ.
Matching job descriptions with resumes.
Syntax
NLP
embedding1 = model.encode(text1)
embedding2 = model.encode(text2)
similarity = cosine_similarity([embedding1], [embedding2])[0][0]
Use a pre-trained model to convert text into embeddings (numbers).
Cosine similarity measures how close two embeddings are, from -1 (opposite) to 1 (same).
Examples
Compare two fruit-related sentences to see how similar they are.
NLP
embedding1 = model.encode('I love apples')
embedding2 = model.encode('I like oranges')
similarity = cosine_similarity([embedding1], [embedding2])[0][0]
Check similarity between two sentences about pets and places.
NLP
embedding1 = model.encode('The cat sits on the mat')
embedding2 = model.encode('A dog lies on the rug')
similarity = cosine_similarity([embedding1], [embedding2])[0][0]
Sample Model
This program uses a pre-trained model to turn sentences into numbers and then finds how close their meanings are using cosine similarity.
NLP
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity

# Load a small pre-trained model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Two example sentences
text1 = 'I enjoy reading books about history.'
text2 = 'Books on historical topics are my favorite.'

# Get embeddings
embedding1 = model.encode(text1)
embedding2 = model.encode(text2)

# Compute cosine similarity
similarity = cosine_similarity([embedding1], [embedding2])[0][0]

print(f'Semantic similarity: {similarity:.4f}')
OutputSuccess
Important Notes
Embeddings capture meaning beyond exact words, so synonyms get high similarity.
Cosine similarity close to 1 means very similar; close to 0 means unrelated.
Pre-trained models like 'all-MiniLM-L6-v2' are fast and good for many tasks.
Summary
Semantic similarity uses embeddings to compare meanings of text.
Cosine similarity measures how close two embeddings are.
Pre-trained models make it easy to get embeddings for sentences.