Bird
Raised Fist0
NlpHow-ToBeginner · 4 min read

How to Use Sentence Transformers in Python for NLP Tasks

Use the sentence-transformers Python library to convert sentences into vectors by loading a pre-trained model with SentenceTransformer and calling encode() on your text. These vectors can then be used for tasks like semantic search, clustering, or classification.
📐

Syntax

The main steps to use sentence transformers in Python are:

  • from sentence_transformers import SentenceTransformer: Import the model class.
  • model = SentenceTransformer('model_name'): Load a pre-trained model by name.
  • embeddings = model.encode(sentences): Convert one or more sentences into vector embeddings.

These embeddings are numerical arrays representing the meaning of sentences.

python
from sentence_transformers import SentenceTransformer

# Load a pre-trained model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Encode sentences to get embeddings
sentences = ['This is an example sentence', 'Each sentence is converted']
embeddings = model.encode(sentences)

print(embeddings.shape)
Output
(2, 384)
💻

Example

This example shows how to load a model, encode sentences, and compare their similarity using cosine similarity.

python
from sentence_transformers import SentenceTransformer, util

# Load model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Sentences to encode
sentences = ['I love machine learning', 'I enjoy studying AI', 'The weather is sunny today']

# Get embeddings
embeddings = model.encode(sentences, convert_to_tensor=True)

# Compute cosine similarity between first two sentences
similarity = util.pytorch_cos_sim(embeddings[0], embeddings[1])

print(f"Similarity between sentence 1 and 2: {similarity.item():.4f}")
Output
Similarity between sentence 1 and 2: 0.7924
⚠️

Common Pitfalls

Common mistakes when using sentence transformers include:

  • Not installing the sentence-transformers package before use.
  • Passing a single sentence as a string instead of a list to encode().
  • Forgetting to convert embeddings to tensors when using similarity functions that require tensors.
  • Using very large models unnecessarily, which slows down encoding.
python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

# Wrong: passing string instead of list
# embeddings = model.encode('This is a sentence')  # This works but returns 1D array

# Right: pass list even for one sentence
embeddings = model.encode(['This is a sentence'])

print(embeddings.shape)  # (1, 384)
Output
(1, 384)
📊

Quick Reference

Function/MethodPurpose
SentenceTransformer('model_name')Load a pre-trained sentence transformer model
model.encode(sentences)Convert sentences (list) to vector embeddings
util.pytorch_cos_sim(vec1, vec2)Calculate cosine similarity between two vectors
convert_to_tensor=TrueOption in encode() to get PyTorch tensors for similarity
model.encode(['sentence'])Encode a single sentence as a list to get 2D array

Key Takeaways

Load a pre-trained model with SentenceTransformer('model_name') before encoding.
Always pass a list of sentences to model.encode() for consistent output shape.
Use embeddings for tasks like similarity by computing cosine similarity between vectors.
Install the sentence-transformers package via pip before using it.
Choose smaller models like 'all-MiniLM-L6-v2' for faster encoding with good accuracy.