Sentence-BERT helps turn sentences into numbers that computers can understand easily. This makes comparing and searching sentences faster and smarter.
Sentence-BERT for embeddings in NLP
from sentence_transformers import SentenceTransformer model = SentenceTransformer('all-MiniLM-L6-v2') sentences = ['This is a sentence.', 'This is another sentence.'] embeddings = model.encode(sentences)
The SentenceTransformer class loads a pre-trained model for sentence embeddings.
The encode method converts sentences into fixed-length vectors (embeddings).
model = SentenceTransformer('all-MiniLM-L6-v2') embedding = model.encode('Hello world!')
sentences = ['I love apples.', 'I enjoy oranges.'] embeddings = model.encode(sentences)
embedding = model.encode('Quick brown fox jumps over the lazy dog', convert_to_tensor=True)
This program loads a Sentence-BERT model, encodes four sentences into vectors, and calculates how similar the first two sentences are using cosine similarity.
from sentence_transformers import SentenceTransformer import numpy as np # Load the pre-trained Sentence-BERT model model = SentenceTransformer('all-MiniLM-L6-v2') # Sentences to encode sentences = [ 'Machine learning is fun.', 'I enjoy learning new things.', 'The cat sits on the mat.', 'Artificial intelligence is the future.' ] # Get embeddings for the sentences embeddings = model.encode(sentences) # Show the shape of embeddings print(f'Embeddings shape: {embeddings.shape}') # Calculate cosine similarity between first and second sentence from numpy.linalg import norm cos_sim = np.dot(embeddings[0], embeddings[1]) / (norm(embeddings[0]) * norm(embeddings[1])) print(f'Cosine similarity between sentence 1 and 2: {cos_sim:.4f}')
Sentence-BERT embeddings are fixed-length vectors (usually 384 or 768 numbers).
Cosine similarity measures how close two sentence meanings are, from -1 (opposite) to 1 (same).
Using convert_to_tensor=True can speed up operations if you use PyTorch or TensorFlow.
Sentence-BERT turns sentences into numbers that capture their meaning.
These embeddings help compare, search, and group sentences easily.
Using pre-trained models like 'all-MiniLM-L6-v2' is simple and effective.