How to use sentence transformers python in nlp

NlpHow-ToBeginner · 4 min read

How to Use Sentence Transformers in Python for NLP Tasks

Use the sentence-transformers Python library to convert sentences into vectors by loading a pre-trained model with SentenceTransformer and calling encode() on your text. These vectors can then be used for tasks like semantic search, clustering, or classification.

📐

Syntax

The main steps to use sentence transformers in Python are:

from sentence_transformers import SentenceTransformer: Import the model class.
model = SentenceTransformer('model_name'): Load a pre-trained model by name.
embeddings = model.encode(sentences): Convert one or more sentences into vector embeddings.

These embeddings are numerical arrays representing the meaning of sentences.

python

from sentence_transformers import SentenceTransformer

# Load a pre-trained model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Encode sentences to get embeddings
sentences = ['This is an example sentence', 'Each sentence is converted']
embeddings = model.encode(sentences)

print(embeddings.shape)

Output

(2, 384)

💻

Example

This example shows how to load a model, encode sentences, and compare their similarity using cosine similarity.

python

from sentence_transformers import SentenceTransformer, util

# Load model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Sentences to encode
sentences = ['I love machine learning', 'I enjoy studying AI', 'The weather is sunny today']

# Get embeddings
embeddings = model.encode(sentences, convert_to_tensor=True)

# Compute cosine similarity between first two sentences
similarity = util.pytorch_cos_sim(embeddings[0], embeddings[1])

print(f"Similarity between sentence 1 and 2: {similarity.item():.4f}")

Output

Similarity between sentence 1 and 2: 0.7924

⚠️

Common Pitfalls

Common mistakes when using sentence transformers include:

Not installing the sentence-transformers package before use.
Passing a single sentence as a string instead of a list to encode().
Forgetting to convert embeddings to tensors when using similarity functions that require tensors.
Using very large models unnecessarily, which slows down encoding.

python

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

# Wrong: passing string instead of list
# embeddings = model.encode('This is a sentence')  # This works but returns 1D array

# Right: pass list even for one sentence
embeddings = model.encode(['This is a sentence'])

print(embeddings.shape)  # (1, 384)

Output

(1, 384)

📊

Quick Reference

Function/Method	Purpose
SentenceTransformer('model_name')	Load a pre-trained sentence transformer model
model.encode(sentences)	Convert sentences (list) to vector embeddings
util.pytorch_cos_sim(vec1, vec2)	Calculate cosine similarity between two vectors
convert_to_tensor=True	Option in encode() to get PyTorch tensors for similarity
model.encode(['sentence'])	Encode a single sentence as a list to get 2D array

✅

Key Takeaways

Load a pre-trained model with SentenceTransformer('model_name') before encoding.

Always pass a list of sentences to model.encode() for consistent output shape.

Use embeddings for tasks like similarity by computing cosine similarity between vectors.

Install the sentence-transformers package via pip before using it.

Choose smaller models like 'all-MiniLM-L6-v2' for faster encoding with good accuracy.