Challenge - 5 Problems
Semantic Similarity Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
โ Predict Output
intermediate2:00remaining
What is the cosine similarity output between two embeddings?
Given two 3-dimensional embeddings:
Calculate the cosine similarity using the formula:
embedding1 = [1, 0, 1]
embedding2 = [0, 1, 1]
Calculate the cosine similarity using the formula:
cosine_similarity = (A ยท B) / (||A|| * ||B||)
NLP
import numpy as np embedding1 = np.array([1, 0, 1]) embedding2 = np.array([0, 1, 1]) cosine_similarity = np.dot(embedding1, embedding2) / (np.linalg.norm(embedding1) * np.linalg.norm(embedding2)) print(round(cosine_similarity, 3))
Attempts:
2 left
๐ก Hint
Recall cosine similarity measures the angle between vectors, dot product divided by product of magnitudes.
โ Incorrect
The dot product is 1*0 + 0*1 + 1*1 = 1. The norms are sqrt(1+0+1)=sqrt(2) each. So cosine similarity = 1 / (sqrt(2)*sqrt(2)) = 1/2 = 0.5.
โ Model Choice
intermediate2:00remaining
Which embedding model is best for capturing sentence-level semantic similarity?
You want to compare the meaning of full sentences, not just words. Which model is most suitable?
Attempts:
2 left
๐ก Hint
Consider models designed to produce embeddings for entire sentences.
โ Incorrect
Word2Vec and GloVe produce word embeddings, not sentence embeddings. One-hot encoding is sparse and does not capture semantics. Sentence-BERT is designed for sentence-level semantic similarity.
โ Hyperparameter
advanced2:00remaining
Which hyperparameter affects the quality of semantic similarity in embedding training?
When training a Word2Vec model, which hyperparameter most directly influences the semantic quality of embeddings?
Attempts:
2 left
๐ก Hint
Think about how much context the model sees around each word.
โ Incorrect
Window size controls how many neighboring words are considered context, directly affecting semantic relationships captured. Other parameters affect training speed or convergence but less directly semantic quality.
โ Metrics
advanced2:00remaining
Which metric is best to evaluate semantic similarity between embeddings?
You have two sentence embeddings and want to measure how similar their meanings are. Which metric is most appropriate?
Attempts:
2 left
๐ก Hint
Consider a metric that measures angle between vectors regardless of length.
โ Incorrect
Cosine similarity measures the angle between vectors, capturing semantic similarity well. Euclidean distance is sensitive to vector length. MSE is for regression errors. Jaccard similarity is for sets, not vectors.
๐ง Debug
expert3:00remaining
Why does this semantic similarity code produce a runtime error?
Code snippet:
What causes the error?
import numpy as np embedding1 = [0.1, 0.3, 0.5] embedding2 = [0.2, 0.4] cos_sim = np.dot(embedding1, embedding2) / (np.linalg.norm(embedding1) * np.linalg.norm(embedding2)) print(cos_sim)
What causes the error?
NLP
import numpy as np embedding1 = [0.1, 0.3, 0.5] embedding2 = [0.2, 0.4] cos_sim = np.dot(embedding1, embedding2) / (np.linalg.norm(embedding1) * np.linalg.norm(embedding2)) print(cos_sim)
Attempts:
2 left
๐ก Hint
Check if both vectors have the same number of elements.
โ Incorrect
The dot product requires vectors of the same length. embedding1 has 3 elements, embedding2 has 2, causing a ValueError.