0
0
NLPml~20 mins

Sentence-BERT for embeddings in NLP - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Sentence-BERT Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
1:30remaining
What is the shape of the embeddings produced by Sentence-BERT?
Given the following code snippet that uses Sentence-BERT to encode a list of sentences, what is the shape of the resulting embeddings array?
NLP
from sentence_transformers import SentenceTransformer
sentences = ['I love machine learning.', 'Sentence-BERT creates embeddings.']
model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode(sentences)
print(embeddings.shape)
A(2, 384)
B(384, 2)
C(2, 768)
D(768, 2)
Attempts:
2 left
💡 Hint
The 'all-MiniLM-L6-v2' model produces 384-dimensional embeddings for each sentence.
Model Choice
intermediate
1:30remaining
Which Sentence-BERT model is best for fast embedding generation on CPU?
You want to generate sentence embeddings quickly on a CPU with limited memory. Which Sentence-BERT model should you choose?
A'bert-large-nli-stsb-mean-tokens' - large and accurate model
B'distilbert-base-nli-stsb-mean-tokens' - medium speed model
C'all-MiniLM-L6-v2' - small and fast model
D'roberta-base-nli-stsb-mean-tokens' - medium size model
Attempts:
2 left
💡 Hint
Smaller models with fewer layers run faster on CPU.
Hyperparameter
advanced
1:30remaining
Which parameter affects the batch size during Sentence-BERT encoding?
When calling the encode() method of a Sentence-BERT model, which parameter controls how many sentences are processed at once?
Anum_workers
Bmax_length
Cdevice
Dbatch_size
Attempts:
2 left
💡 Hint
This parameter helps balance speed and memory usage.
Metrics
advanced
1:30remaining
What metric is commonly used to evaluate Sentence-BERT embeddings on semantic textual similarity tasks?
Which metric best measures how well Sentence-BERT embeddings capture sentence similarity on datasets like STS Benchmark?
AF1 score
BCosine similarity correlation (Spearman's rho)
CAccuracy
DMean squared error
Attempts:
2 left
💡 Hint
This metric measures rank correlation between predicted and true similarity scores.
🔧 Debug
expert
2:00remaining
Why does this Sentence-BERT encoding code raise a RuntimeError?
Consider this code snippet: from sentence_transformers import SentenceTransformer model = SentenceTransformer('all-MiniLM-L6-v2') sentences = ['Hello world'] * 10000 embeddings = model.encode(sentences, device='cuda') It raises a RuntimeError: CUDA out of memory. What is the best way to fix this?
AReduce batch_size in encode() to a smaller number like 32
BRemove device='cuda' to run on CPU instead
CIncrease the number of sentences processed at once
DUse a larger GPU with more memory
Attempts:
2 left
💡 Hint
Large batches can exceed GPU memory limits.