Bird
Raised Fist0
NLPml~20 mins

Why embeddings capture semantic meaning in NLP - Challenge Your Understanding

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Semantic Embeddings Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Why do word embeddings place similar words close together?

Word embeddings map words to vectors in space. Why do similar words end up close to each other in this space?

ABecause embeddings assign random vectors initially and never update them.
BBecause embeddings are trained to predict context words, so words used in similar contexts get similar vectors.
CBecause embeddings group words by their length rather than meaning.
DBecause embeddings only consider the first letter of each word.
Attempts:
2 left
💡 Hint

Think about how words that appear in similar sentences might share meaning.

Predict Output
intermediate
2:00remaining
Output of cosine similarity between embeddings

Given two word embeddings represented as vectors, what is the output of the cosine similarity calculation?

NLP
import numpy as np

vec1 = np.array([1, 2, 3])
vec2 = np.array([2, 4, 6])

cos_sim = np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))
print('{:.2f}'.format(cos_sim))
A0.87
B0.00
C1.00
D0.50
Attempts:
2 left
💡 Hint

Consider if one vector is a scaled version of the other.

Model Choice
advanced
2:00remaining
Choosing embedding size for semantic capture

You want to train word embeddings that capture rich semantic meaning. Which embedding size is most likely to work best?

AEmbedding size of 50 dimensions
BEmbedding size of 5000 dimensions
CEmbedding size of 5 dimensions
DEmbedding size of 1 dimension
Attempts:
2 left
💡 Hint

Think about balancing detail and overfitting.

Metrics
advanced
2:00remaining
Evaluating semantic quality of embeddings

Which metric is best suited to evaluate if embeddings capture semantic similarity between words?

ACosine similarity correlation with human similarity scores
BMean Squared Error between embedding vectors
CAccuracy of a classification model using embeddings as input
DNumber of unique words in the vocabulary
Attempts:
2 left
💡 Hint

Think about comparing embedding similarity to human judgments.

🔧 Debug
expert
3:00remaining
Why does this embedding training code produce identical vectors?

Consider this simplified embedding training snippet. Why do all word vectors end up identical?

NLP
import numpy as np

vocab = ['cat', 'dog', 'fish']
embeddings = {word: np.zeros(3) for word in vocab}

for epoch in range(3):
    for word in vocab:
        embeddings[word] += 0.1

print(embeddings)
ABecause numpy arrays cannot be updated in-place.
BBecause the loop only updates the first word's embedding.
CBecause the code resets embeddings to zero inside the loop each time.
DBecause all embeddings start at zero and are incremented by the same scalar, they remain identical vectors.
Attempts:
2 left
💡 Hint

Look at how the embeddings are initialized and updated.

Practice

(1/5)
1. Why do word embeddings help computers understand language better?
easy
A. Because they turn words into numbers that show their meaning
B. Because they translate words into different languages
C. Because they count how many times a word appears
D. Because they remove stop words from sentences

Solution

  1. Step 1: Understand what embeddings do

    Embeddings convert words into numbers (vectors) that represent their meanings.
  2. Step 2: Recognize the benefit for computers

    These numbers help computers see which words are similar in meaning by their closeness in vector space.
  3. Final Answer:

    Because they turn words into numbers that show their meaning -> Option A
  4. Quick Check:

    Embeddings = numeric meaning representation [OK]
Hint: Embeddings = words as meaningful numbers [OK]
Common Mistakes:
  • Thinking embeddings translate languages
  • Confusing embeddings with word frequency counts
  • Believing embeddings remove words
2. Which of the following is the correct way to represent a word embedding vector in code?
easy
A. embedding = 'word vector'
B. embedding = {'word': 1}
C. embedding = 12345
D. embedding = [0.1, 0.5, -0.3]

Solution

  1. Step 1: Identify the data type for embeddings

    Embeddings are numeric vectors, usually lists or arrays of floats.
  2. Step 2: Check each option's format

    embedding = [0.1, 0.5, -0.3] shows a list of numbers, which is correct. Others are strings, integers, or dictionaries, which are incorrect.
  3. Final Answer:

    embedding = [0.1, 0.5, -0.3] -> Option D
  4. Quick Check:

    Embedding vector = list of numbers [OK]
Hint: Embedding = list of numbers, not strings or ints [OK]
Common Mistakes:
  • Using strings instead of numeric vectors
  • Using single numbers instead of vectors
  • Using dictionaries instead of lists
3. Given the following embeddings:
embedding_cat = [0.2, 0.4, 0.6]
embedding_dog = [0.21, 0.39, 0.58]
embedding_car = [0.9, 0.1, 0.2]
Which pair is most semantically similar based on cosine similarity?
medium
A. dog and car
B. cat and car
C. cat and dog
D. All pairs are equally similar

Solution

  1. Step 1: Understand cosine similarity

    Cosine similarity measures how close two vectors point in the same direction; higher means more similar.
  2. Step 2: Compare vectors

    embedding_cat and embedding_dog are close numerically, so their cosine similarity is high. embedding_car is quite different.
  3. Final Answer:

    cat and dog -> Option C
  4. Quick Check:

    Closest vectors = most similar words [OK]
Hint: Closest vectors mean similar words [OK]
Common Mistakes:
  • Assuming car is similar to cat or dog
  • Thinking all pairs have same similarity
  • Ignoring vector closeness
4. You have this code snippet to compute similarity between two embeddings:
def similarity(vec1, vec2):
    return sum(a*b for a, b in zip(vec1, vec2))

embedding1 = [0.3, 0.5, 0.2]
embedding2 = [0.3, 0.5]
print(similarity(embedding1, embedding2))

What is the main problem here?
medium
A. The vectors have different lengths causing incorrect similarity
B. The function uses sum instead of product
C. The function should return a list, not a number
D. The embeddings contain strings instead of numbers

Solution

  1. Step 1: Check vector lengths

    embedding1 has 3 elements, embedding2 has 2 elements, so zip stops early, ignoring last element of embedding1.
  2. Step 2: Understand impact on similarity

    This causes incomplete calculation and inaccurate similarity score.
  3. Final Answer:

    The vectors have different lengths causing incorrect similarity -> Option A
  4. Quick Check:

    Vector length mismatch = wrong similarity [OK]
Hint: Vectors must be same length for similarity [OK]
Common Mistakes:
  • Ignoring vector length mismatch
  • Thinking sum is wrong operation here
  • Expecting list output instead of number
5. You want to improve a chatbot's understanding by using embeddings. Which approach best captures semantic meaning for similar questions like "How are you?" and "How do you do?"?
hard
A. Use only the first word's embedding as sentence meaning
B. Use pretrained word embeddings and average their vectors for the whole sentence
C. Use random vectors for each word without training
D. Use one-hot encoding for each word and sum them

Solution

  1. Step 1: Understand sentence embedding from word embeddings

    Averaging pretrained word embeddings creates a vector representing the whole sentence's meaning.
  2. Step 2: Compare other options

    One-hot encoding loses semantic info, random vectors have no meaning, and using only first word misses context.
  3. Final Answer:

    Use pretrained word embeddings and average their vectors for the whole sentence -> Option B
  4. Quick Check:

    Average pretrained embeddings = better sentence meaning [OK]
Hint: Average pretrained embeddings for sentence meaning [OK]
Common Mistakes:
  • Using one-hot encoding which lacks meaning
  • Using random vectors without training
  • Ignoring all words except the first