Challenge - 5 Problems
Word2Vec Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
What is the output of this Word2Vec training snippet?
Consider the following code that trains a Word2Vec model on a small corpus. What will be the shape of the vector for the word 'king' after training?
NLP
from gensim.models import Word2Vec sentences = [['king', 'queen', 'man', 'woman'], ['apple', 'orange', 'fruit'], ['king', 'man', 'apple']] model = Word2Vec(sentences, vector_size=50, window=2, min_count=1, epochs=10) vector = model.wv['king'] print(vector.shape)
Attempts:
2 left
💡 Hint
Check the vector_size parameter in the Word2Vec constructor.
✗ Incorrect
The vector_size parameter sets the dimensionality of the word vectors. Here it is 50, so the vector for any word will have shape (50,).
❓ Model Choice
intermediate1:30remaining
Which Word2Vec parameter controls the size of the context window?
You want to train a Word2Vec model and control how many words before and after the target word are considered as context. Which parameter do you set?
Attempts:
2 left
💡 Hint
Think about the number of words around the target word.
✗ Incorrect
The 'window' parameter defines the maximum distance between the current and predicted word within a sentence, i.e., the context window size.
❓ Hyperparameter
advanced1:30remaining
Which parameter affects ignoring infrequent words during Word2Vec training?
You want to exclude words that appear less than a certain number of times in your corpus to speed up training and reduce noise. Which Word2Vec parameter do you adjust?
Attempts:
2 left
💡 Hint
This parameter filters out rare words.
✗ Incorrect
The 'min_count' parameter sets the minimum frequency threshold for words to be included in the vocabulary.
❓ Metrics
advanced2:00remaining
Which metric can you use to evaluate the quality of Word2Vec embeddings?
After training a Word2Vec model, you want to check if the embeddings capture semantic similarity well. Which metric or method is commonly used?
Attempts:
2 left
💡 Hint
Think about measuring similarity between vectors.
✗ Incorrect
Cosine similarity measures how close two word vectors are in direction, which reflects semantic similarity.
🔧 Debug
expert2:30remaining
Why does this Word2Vec training code raise a KeyError?
You run this code and get a KeyError when accessing the vector for 'banana'. What is the cause?
NLP
from gensim.models import Word2Vec sentences = [['apple', 'orange', 'fruit'], ['king', 'queen', 'man', 'woman']] model = Word2Vec(sentences, vector_size=20, window=2, min_count=2, epochs=5) vector = model.wv['banana']
Attempts:
2 left
💡 Hint
Check the min_count parameter and the words in the sentences.
✗ Incorrect
Since 'banana' never appears in the sentences, and min_count=2 filters out words appearing less than twice, 'banana' is not in the vocabulary, causing KeyError.