0
0
NLPml~20 mins

Why embeddings capture semantic meaning in NLP - Challenge Your Understanding

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Semantic Embeddings Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Why do word embeddings place similar words close together?

Word embeddings map words to vectors in space. Why do similar words end up close to each other in this space?

ABecause embeddings assign random vectors initially and never update them.
BBecause embeddings are trained to predict context words, so words used in similar contexts get similar vectors.
CBecause embeddings group words by their length rather than meaning.
DBecause embeddings only consider the first letter of each word.
Attempts:
2 left
💡 Hint

Think about how words that appear in similar sentences might share meaning.

Predict Output
intermediate
2:00remaining
Output of cosine similarity between embeddings

Given two word embeddings represented as vectors, what is the output of the cosine similarity calculation?

NLP
import numpy as np

vec1 = np.array([1, 2, 3])
vec2 = np.array([2, 4, 6])

cos_sim = np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))
print('{:.2f}'.format(cos_sim))
A0.87
B0.00
C1.00
D0.50
Attempts:
2 left
💡 Hint

Consider if one vector is a scaled version of the other.

Model Choice
advanced
2:00remaining
Choosing embedding size for semantic capture

You want to train word embeddings that capture rich semantic meaning. Which embedding size is most likely to work best?

AEmbedding size of 50 dimensions
BEmbedding size of 5000 dimensions
CEmbedding size of 5 dimensions
DEmbedding size of 1 dimension
Attempts:
2 left
💡 Hint

Think about balancing detail and overfitting.

Metrics
advanced
2:00remaining
Evaluating semantic quality of embeddings

Which metric is best suited to evaluate if embeddings capture semantic similarity between words?

ACosine similarity correlation with human similarity scores
BMean Squared Error between embedding vectors
CAccuracy of a classification model using embeddings as input
DNumber of unique words in the vocabulary
Attempts:
2 left
💡 Hint

Think about comparing embedding similarity to human judgments.

🔧 Debug
expert
3:00remaining
Why does this embedding training code produce identical vectors?

Consider this simplified embedding training snippet. Why do all word vectors end up identical?

NLP
import numpy as np

vocab = ['cat', 'dog', 'fish']
embeddings = {word: np.zeros(3) for word in vocab}

for epoch in range(3):
    for word in vocab:
        embeddings[word] += 0.1

print(embeddings)
ABecause numpy arrays cannot be updated in-place.
BBecause the loop only updates the first word's embedding.
CBecause the code resets embeddings to zero inside the loop each time.
DBecause all embeddings start at zero and are incremented by the same scalar, they remain identical vectors.
Attempts:
2 left
💡 Hint

Look at how the embeddings are initialized and updated.