What is Semantic similarity with embeddings in NLP?

NLPml~5 mins

Semantic similarity with embeddings in NLP

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Introduction

Semantic similarity helps us find how close two pieces of text are in meaning, even if they use different words.

Finding if two sentences mean the same thing in a chatbot.

Grouping similar customer reviews together.

Recommending articles that talk about similar topics.

Checking if a question is already answered in a FAQ.

Matching job descriptions with resumes.

Syntax

NLP

embedding1 = model.encode(text1)
embedding2 = model.encode(text2)
similarity = cosine_similarity([embedding1], [embedding2])[0][0]

Use a pre-trained model to convert text into embeddings (numbers).

Cosine similarity measures how close two embeddings are, from -1 (opposite) to 1 (same).

Examples

Compare two fruit-related sentences to see how similar they are.

NLP

embedding1 = model.encode('I love apples')
embedding2 = model.encode('I like oranges')
similarity = cosine_similarity([embedding1], [embedding2])[0][0]

Check similarity between two sentences about pets and places.

NLP

embedding1 = model.encode('The cat sits on the mat')
embedding2 = model.encode('A dog lies on the rug')
similarity = cosine_similarity([embedding1], [embedding2])[0][0]

Sample Model

This program uses a pre-trained model to turn sentences into numbers and then finds how close their meanings are using cosine similarity.

NLP

from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity

# Load a small pre-trained model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Two example sentences
text1 = 'I enjoy reading books about history.'
text2 = 'Books on historical topics are my favorite.'

# Get embeddings
embedding1 = model.encode(text1)
embedding2 = model.encode(text2)

# Compute cosine similarity
similarity = cosine_similarity([embedding1], [embedding2])[0][0]

print(f'Semantic similarity: {similarity:.4f}')

OutputSuccess

Important Notes

Embeddings capture meaning beyond exact words, so synonyms get high similarity.

Cosine similarity close to 1 means very similar; close to 0 means unrelated.

Pre-trained models like 'all-MiniLM-L6-v2' are fast and good for many tasks.

Summary

Semantic similarity uses embeddings to compare meanings of text.

Cosine similarity measures how close two embeddings are.

Pre-trained models make it easy to get embeddings for sentences.

Practice

(1/5)

1. What does semantic similarity with embeddings help us do in natural language processing?

easy

A. Translate text from one language to another

B. Count the number of words in a sentence

C. Measure how similar the meanings of two texts are

D. Generate random sentences

Semantic similarity with embeddings in NLP

Start learning this pattern below

Practice

Solution

Step 1: Understand semantic similarity

Step 2: Role of embeddings

Final Answer:

Quick Check:

Solution

Step 1: Identify cosine similarity function

Step 2: Check other libraries

Final Answer:

Quick Check:

Solution

Step 1: Understand cosine similarity formula

Step 2: Analyze given vectors

Final Answer:

Quick Check:

Solution

Step 1: Check input format for cosine_similarity

Step 2: Confirm other options

Final Answer:

Quick Check:

Solution

Step 1: Understand semantic similarity goal

Step 2: Use embeddings and cosine similarity

Final Answer:

Quick Check: