0
0
NLPml~5 mins

Sentence-BERT for embeddings in NLP

Choose your learning style9 modes available
Introduction

Sentence-BERT helps turn sentences into numbers that computers can understand easily. This makes comparing and searching sentences faster and smarter.

When you want to find similar sentences in a large collection quickly.
When you need to group sentences by meaning, like sorting customer feedback.
When building chatbots that understand user questions better.
When you want to search documents by meaning, not just exact words.
Syntax
NLP
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

sentences = ['This is a sentence.', 'This is another sentence.']
embeddings = model.encode(sentences)

The SentenceTransformer class loads a pre-trained model for sentence embeddings.

The encode method converts sentences into fixed-length vectors (embeddings).

Examples
Encode a single sentence into an embedding vector.
NLP
model = SentenceTransformer('all-MiniLM-L6-v2')
embedding = model.encode('Hello world!')
Encode a list of sentences at once to get their embeddings.
NLP
sentences = ['I love apples.', 'I enjoy oranges.']
embeddings = model.encode(sentences)
Get the embedding as a tensor for faster math operations.
NLP
embedding = model.encode('Quick brown fox jumps over the lazy dog', convert_to_tensor=True)
Sample Model

This program loads a Sentence-BERT model, encodes four sentences into vectors, and calculates how similar the first two sentences are using cosine similarity.

NLP
from sentence_transformers import SentenceTransformer
import numpy as np

# Load the pre-trained Sentence-BERT model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Sentences to encode
sentences = [
    'Machine learning is fun.',
    'I enjoy learning new things.',
    'The cat sits on the mat.',
    'Artificial intelligence is the future.'
]

# Get embeddings for the sentences
embeddings = model.encode(sentences)

# Show the shape of embeddings
print(f'Embeddings shape: {embeddings.shape}')

# Calculate cosine similarity between first and second sentence
from numpy.linalg import norm
cos_sim = np.dot(embeddings[0], embeddings[1]) / (norm(embeddings[0]) * norm(embeddings[1]))
print(f'Cosine similarity between sentence 1 and 2: {cos_sim:.4f}')
OutputSuccess
Important Notes

Sentence-BERT embeddings are fixed-length vectors (usually 384 or 768 numbers).

Cosine similarity measures how close two sentence meanings are, from -1 (opposite) to 1 (same).

Using convert_to_tensor=True can speed up operations if you use PyTorch or TensorFlow.

Summary

Sentence-BERT turns sentences into numbers that capture their meaning.

These embeddings help compare, search, and group sentences easily.

Using pre-trained models like 'all-MiniLM-L6-v2' is simple and effective.