Sentence transformers are used to convert sentences into vectors. What is the main reason for doing this?
Think about how computers understand text for tasks like search or clustering.
Sentence transformers convert sentences into fixed-size vectors so that similar sentences have vectors close to each other. This helps in tasks like semantic search and clustering.
Given two sentences embedded using a sentence transformer, what is the cosine similarity output?
from sentence_transformers import SentenceTransformer from sklearn.metrics.pairwise import cosine_similarity model = SentenceTransformer('all-MiniLM-L6-v2') sentences = ['I love machine learning.', 'Machine learning is great.'] embeddings = model.encode(sentences) similarity = cosine_similarity([embeddings[0]], [embeddings[1]])[0][0] print(round(similarity, 2))
These sentences have similar meaning, so expect a high similarity but less than 1.
The cosine similarity between embeddings of similar sentences is close to 1 but not exactly 1. Here, it is about 0.85.
You want to build a semantic search engine that balances speed and accuracy on millions of sentences. Which sentence transformer model is best?
Consider the trade-off between speed and accuracy for large datasets.
'all-MiniLM-L6-v2' is optimized for fast embedding generation with good accuracy, making it suitable for large-scale semantic search.
Sentence transformers use pooling to create sentence embeddings from token embeddings. What happens if you change from mean pooling to max pooling?
Think about how max pooling selects values compared to averaging.
Max pooling picks the highest value per dimension, highlighting strong features, which can make embeddings more sensitive to important words.
Consider this code snippet:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode('This is a test sentence')
print(embeddings.shape)What error will this code raise and why?
Check the documentation for the encode method input types and output.
The encode method accepts a single string or list of strings and returns a numpy array with shape. So no error occurs and shape prints correctly.