Bird
Raised Fist0
NLPml~20 mins

Sentence-BERT for embeddings in NLP - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Sentence-BERT Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
1:30remaining
What is the shape of the embeddings produced by Sentence-BERT?
Given the following code snippet that uses Sentence-BERT to encode a list of sentences, what is the shape of the resulting embeddings array?
NLP
from sentence_transformers import SentenceTransformer
sentences = ['I love machine learning.', 'Sentence-BERT creates embeddings.']
model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode(sentences)
print(embeddings.shape)
A(2, 384)
B(384, 2)
C(2, 768)
D(768, 2)
Attempts:
2 left
💡 Hint
The 'all-MiniLM-L6-v2' model produces 384-dimensional embeddings for each sentence.
Model Choice
intermediate
1:30remaining
Which Sentence-BERT model is best for fast embedding generation on CPU?
You want to generate sentence embeddings quickly on a CPU with limited memory. Which Sentence-BERT model should you choose?
A'bert-large-nli-stsb-mean-tokens' - large and accurate model
B'distilbert-base-nli-stsb-mean-tokens' - medium speed model
C'all-MiniLM-L6-v2' - small and fast model
D'roberta-base-nli-stsb-mean-tokens' - medium size model
Attempts:
2 left
💡 Hint
Smaller models with fewer layers run faster on CPU.
Hyperparameter
advanced
1:30remaining
Which parameter affects the batch size during Sentence-BERT encoding?
When calling the encode() method of a Sentence-BERT model, which parameter controls how many sentences are processed at once?
Anum_workers
Bmax_length
Cdevice
Dbatch_size
Attempts:
2 left
💡 Hint
This parameter helps balance speed and memory usage.
Metrics
advanced
1:30remaining
What metric is commonly used to evaluate Sentence-BERT embeddings on semantic textual similarity tasks?
Which metric best measures how well Sentence-BERT embeddings capture sentence similarity on datasets like STS Benchmark?
AF1 score
BCosine similarity correlation (Spearman's rho)
CAccuracy
DMean squared error
Attempts:
2 left
💡 Hint
This metric measures rank correlation between predicted and true similarity scores.
🔧 Debug
expert
2:00remaining
Why does this Sentence-BERT encoding code raise a RuntimeError?
Consider this code snippet: from sentence_transformers import SentenceTransformer model = SentenceTransformer('all-MiniLM-L6-v2') sentences = ['Hello world'] * 10000 embeddings = model.encode(sentences, device='cuda') It raises a RuntimeError: CUDA out of memory. What is the best way to fix this?
AReduce batch_size in encode() to a smaller number like 32
BRemove device='cuda' to run on CPU instead
CIncrease the number of sentences processed at once
DUse a larger GPU with more memory
Attempts:
2 left
💡 Hint
Large batches can exceed GPU memory limits.

Practice

(1/5)
1. What is the main purpose of Sentence-BERT embeddings in NLP?
easy
A. To count the number of words in a sentence
B. To translate sentences into different languages
C. To generate random sentences for data augmentation
D. To convert sentences into numbers that capture their meaning

Solution

  1. Step 1: Understand Sentence-BERT's role

    Sentence-BERT creates embeddings, which are numbers representing sentence meaning.
  2. Step 2: Compare options with Sentence-BERT's function

    Only To convert sentences into numbers that capture their meaning describes converting sentences into meaningful numbers, matching Sentence-BERT's purpose.
  3. Final Answer:

    To convert sentences into numbers that capture their meaning -> Option D
  4. Quick Check:

    Sentence-BERT embeddings = meaningful numbers [OK]
Hint: Remember: embeddings = numbers capturing meaning [OK]
Common Mistakes:
  • Confusing embeddings with translation
  • Thinking embeddings count words
  • Assuming embeddings generate sentences
2. Which Python code snippet correctly loads a pre-trained Sentence-BERT model using the sentence-transformers library?
easy
A. from sentence_transformers import SentenceTransformer model = SentenceTransformer('all-MiniLM-L6-v2')
B. import sentence_transformers model = sentence_transformers.load('all-MiniLM-L6-v2')
C. from transformers import SentenceBert model = SentenceBert.load('all-MiniLM-L6-v2')
D. import sbert model = sbert.SentenceTransformer('all-MiniLM-L6-v2')

Solution

  1. Step 1: Recall correct import and model loading syntax

    The sentence-transformers library uses 'from sentence_transformers import SentenceTransformer' and then creates a model instance with the model name.
  2. Step 2: Check each option for correctness

    from sentence_transformers import SentenceTransformer model = SentenceTransformer('all-MiniLM-L6-v2') matches the correct syntax. Options A, B, and D use incorrect imports or methods.
  3. Final Answer:

    from sentence_transformers import SentenceTransformer model = SentenceTransformer('all-MiniLM-L6-v2') -> Option A
  4. Quick Check:

    Correct import and model load = from sentence_transformers import SentenceTransformer model = SentenceTransformer('all-MiniLM-L6-v2') [OK]
Hint: Use 'from sentence_transformers import SentenceTransformer' [OK]
Common Mistakes:
  • Using wrong import statements
  • Calling non-existent load methods
  • Confusing transformers library with sentence-transformers
3. Given the code below, what is the output shape of embeddings?
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
sentences = ['Hello world', 'How are you?']
embeddings = model.encode(sentences)
print(embeddings.shape)
medium
A. (384, 2)
B. (2, 384)
C. (2, 768)
D. (1, 384)

Solution

  1. Step 1: Understand input and output of model.encode()

    Input is 2 sentences, so output embeddings will have 2 rows, one per sentence.
  2. Step 2: Know embedding dimension of 'all-MiniLM-L6-v2'

    This model produces embeddings of size 384 per sentence.
  3. Final Answer:

    (2, 384) -> Option B
  4. Quick Check:

    2 sentences x 384 dims = (2, 384) [OK]
Hint: Output shape = (number of sentences, embedding size) [OK]
Common Mistakes:
  • Swapping dimensions in output shape
  • Assuming embedding size is 768
  • Forgetting batch size dimension
4. You run this code but get an error: AttributeError: module 'sentence_transformers' has no attribute 'load'. What is the likely cause?
import sentence_transformers
model = sentence_transformers.load('all-MiniLM-L6-v2')
medium
A. The model file is missing from local directory
B. The model name 'all-MiniLM-L6-v2' is incorrect
C. The sentence_transformers module does not have a 'load' function
D. You need to import SentenceTransformer class explicitly

Solution

  1. Step 1: Analyze the error message

    The error says 'sentence_transformers' has no attribute 'load', meaning 'load' is not a valid function in this module.
  2. Step 2: Understand correct usage

    The correct way is to import SentenceTransformer class and instantiate it with the model name, not use 'load'.
  3. Final Answer:

    The sentence_transformers module does not have a 'load' function -> Option C
  4. Quick Check:

    AttributeError means wrong function call [OK]
Hint: Use SentenceTransformer(), not load() [OK]
Common Mistakes:
  • Calling non-existent 'load' method
  • Not importing SentenceTransformer class
  • Assuming model loads from local file by default
5. You want to find the most similar sentence to 'I love machine learning' from a list using Sentence-BERT embeddings. Which approach is best?
hard
A. Encode all sentences and query, then find the sentence with highest cosine similarity to the query embedding
B. Count common words between query and each sentence, pick the highest count
C. Use a pre-trained translation model to translate sentences before comparison
D. Encode only the query sentence and compare it to raw text sentences

Solution

  1. Step 1: Understand how Sentence-BERT embeddings are used for similarity

    Sentence-BERT embeddings represent sentence meaning as vectors; similarity is measured by cosine similarity between vectors.
  2. Step 2: Evaluate options for similarity search

    Encode all sentences and query, then find the sentence with highest cosine similarity to the query embedding correctly encodes all sentences and compares embeddings using cosine similarity. Other options do not use embeddings properly or rely on less effective methods.
  3. Final Answer:

    Encode all sentences and query, then find the sentence with highest cosine similarity to the query embedding -> Option A
  4. Quick Check:

    Embedding + cosine similarity = best similarity search [OK]
Hint: Compare embeddings with cosine similarity for best match [OK]
Common Mistakes:
  • Comparing raw text instead of embeddings
  • Using word count instead of semantic similarity
  • Encoding only query, not all sentences