Bird
Raised Fist0
NLPml~20 mins

Word2Vec (CBOW and Skip-gram) in NLP - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Word2Vec Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:30remaining
Difference between CBOW and Skip-gram in Word2Vec

Which statement correctly describes the main difference between the CBOW and Skip-gram models in Word2Vec?

ACBOW predicts the target word from surrounding context words, while Skip-gram predicts surrounding context words from the target word.
BCBOW is used only for large datasets, while Skip-gram works only on small datasets.
CCBOW uses one-hot encoding for words, while Skip-gram uses word embeddings directly as input.
DCBOW predicts the next word in a sentence, while Skip-gram predicts the previous word.
Attempts:
2 left
💡 Hint

Think about which model uses context to predict the center word and which uses the center word to predict context.

Predict Output
intermediate
2:00remaining
Output of a simple Skip-gram training step

Given the following simplified Skip-gram training code snippet, what will be the shape of the output vector representing the predicted context word probabilities?

NLP
import numpy as np

vocab_size = 10
embedding_dim = 5

# Random embedding matrix
embeddings = np.random.rand(vocab_size, embedding_dim)

# One-hot encoded center word (index 3)
center_word = np.zeros(vocab_size)
center_word[3] = 1

# Compute hidden layer (embedding lookup)
hidden = embeddings.T @ center_word  # shape (embedding_dim,)

# Output weights
output_weights = np.random.rand(vocab_size, embedding_dim)

# Compute output layer
output = output_weights @ hidden  # shape ?

print(output.shape)
A(5,)
B(10,)
C(1, 10)
D(10, 5)
Attempts:
2 left
💡 Hint

Consider the matrix multiplication dimensions: output_weights (vocab_size x embedding_dim) times hidden (embedding_dim,).

Model Choice
advanced
1:30remaining
Choosing Word2Vec model for rare words

You want to train word embeddings on a small dataset with many rare words. Which Word2Vec model is generally better at learning good embeddings for rare words?

ACBOW, because it averages context and smooths rare word signals.
BSkip-gram, because it ignores rare words during training.
CSkip-gram, because it predicts context from the target word and better captures rare word representations.
DCBOW, because it uses hierarchical softmax which is faster for rare words.
Attempts:
2 left
💡 Hint

Think about which model focuses more on individual target words and their contexts.

Hyperparameter
advanced
1:30remaining
Effect of window size in Word2Vec training

In Word2Vec training, what is the effect of increasing the window size parameter?

AIt controls the learning rate decay during training.
BIt decreases the number of context words, focusing on very close neighbors only.
CIt changes the embedding dimension size, making vectors longer.
DIt increases the number of context words considered, capturing broader semantic relationships but may add noise.
Attempts:
2 left
💡 Hint

Window size defines how many words around the target word are used as context.

Metrics
expert
2:00remaining
Evaluating Word2Vec embeddings with analogy task

After training Word2Vec embeddings, you want to evaluate them using the analogy task: "king is to queen as man is to ?". Which metric best measures the quality of the embeddings on this task?

ACosine similarity between the vector (queen - king + man) and all other word vectors to find the closest match.
BEuclidean distance between the vector (king + queen) and (man + woman).
CDot product between the embeddings of 'king' and 'queen' only.
DMean squared error between predicted and true word indices.
Attempts:
2 left
💡 Hint

Think about how analogy tasks use vector arithmetic and similarity measures.

Practice

(1/5)
1. What is the main difference between the CBOW and Skip-gram models in Word2Vec?
easy
A. CBOW uses one-hot encoding, Skip-gram uses frequency encoding.
B. CBOW predicts a word based on its context, while Skip-gram predicts context words from a target word.
C. CBOW is used only for sentences, Skip-gram only for paragraphs.
D. CBOW requires labeled data, Skip-gram does not.

Solution

  1. Step 1: Understand CBOW model purpose

    CBOW tries to predict the target word using the surrounding context words.
  2. Step 2: Understand Skip-gram model purpose

    Skip-gram tries to predict the surrounding context words given the target word.
  3. Final Answer:

    CBOW predicts a word based on its context, while Skip-gram predicts context words from a target word. -> Option B
  4. Quick Check:

    CBOW = context to word, Skip-gram = word to context [OK]
Hint: Remember CBOW = context to word, Skip-gram = word to context [OK]
Common Mistakes:
  • Confusing which model predicts context vs. target word
  • Thinking both models do the same prediction
  • Assuming CBOW needs labeled data
2. Which of the following is the correct way to initialize a Skip-gram Word2Vec model using the Gensim library in Python?
easy
A. Word2Vec(sentences, size=100, window=5, sg=0)
B. Word2Vec(sentences, vector_size=100, window=5, sg=0)
C. Word2Vec(sentences, size=100, window=5, sg=1)
D. Word2Vec(sentences, vector_size=100, window=5, sg=1)

Solution

  1. Step 1: Identify correct parameter for Skip-gram

    In Gensim, 'sg=1' sets Skip-gram, 'sg=0' sets CBOW.
  2. Step 2: Use correct parameter names

    Since Gensim 4.0+, 'vector_size' replaces 'size' for embedding dimension.
  3. Final Answer:

    Word2Vec(sentences, vector_size=100, window=5, sg=1) -> Option D
  4. Quick Check:

    sg=1 and vector_size used correctly [OK]
Hint: Use sg=1 for Skip-gram and vector_size for embedding size [OK]
Common Mistakes:
  • Using 'size' instead of 'vector_size' in recent Gensim versions
  • Setting sg=0 which is CBOW, not Skip-gram
  • Confusing sg parameter values
3. Given the following code snippet using Gensim's Word2Vec with Skip-gram, what will be the output of model.wv.most_similar('king', topn=1) if the model is trained on a typical English corpus?
medium
A. [('run', similarity_score)]
B. [('apple', similarity_score)]
C. [('queen', similarity_score)]
D. [('car', similarity_score)]

Solution

  1. Step 1: Understand Word2Vec similarity

    Word2Vec finds words with similar meanings or contexts; 'queen' is semantically close to 'king'.
  2. Step 2: Analyze typical English corpus relations

    Words like 'apple', 'car', or 'run' are unrelated to 'king' in meaning or context.
  3. Final Answer:

    [('queen', similarity_score)] -> Option C
  4. Quick Check:

    Most similar to 'king' is 'queen' [OK]
Hint: Most similar to 'king' is usually 'queen' in English corpora [OK]
Common Mistakes:
  • Choosing unrelated words as most similar
  • Confusing syntactic similarity with semantic similarity
  • Expecting exact similarity scores
4. You trained a CBOW Word2Vec model but get an error: KeyError: 'unknown_word' when querying model.wv['unknown_word']. What is the most likely cause and fix?
medium
A. The word was not in training data; retrain with larger corpus or check vocabulary before querying.
B. The model was trained with Skip-gram; switch to CBOW to fix.
C. The vector size is too small; increase vector_size parameter.
D. The window size is too large; reduce window parameter.

Solution

  1. Step 1: Understand KeyError cause

    KeyError occurs when the queried word is not in the model's vocabulary.
  2. Step 2: Fix by ensuring word presence

    Either add the word to training data or check if word exists before querying to avoid error.
  3. Final Answer:

    The word was not in training data; retrain with larger corpus or check vocabulary before querying. -> Option A
  4. Quick Check:

    KeyError means word missing in vocabulary [OK]
Hint: Check if word is in vocabulary before querying model vectors [OK]
Common Mistakes:
  • Assuming model type (CBOW/Skip-gram) causes KeyError
  • Changing vector or window size to fix missing word error
  • Ignoring vocabulary check before querying
5. You want to train a Word2Vec model to capture rare word meanings better. Which approach is best?
hard
A. Use Skip-gram with a smaller window size and increase training epochs.
B. Use CBOW with a large window size and fewer epochs.
C. Use Skip-gram with a large window size and fewer epochs.
D. Use CBOW with a smaller window size and increase training epochs.

Solution

  1. Step 1: Identify model for rare words

    Skip-gram is better at learning rare word representations than CBOW.
  2. Step 2: Adjust window size and epochs

    Smaller window focuses on close context, improving rare word meaning; more epochs improve training quality.
  3. Final Answer:

    Use Skip-gram with a smaller window size and increase training epochs. -> Option A
  4. Quick Check:

    Skip-gram + small window + more epochs = better rare word capture [OK]
Hint: Skip-gram + small window + more epochs helps rare words [OK]
Common Mistakes:
  • Choosing CBOW for rare word learning
  • Using large window size which dilutes context
  • Reducing epochs which limits training