0
0
NLPml~20 mins

Training Word2Vec with Gensim in NLP - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Word2Vec Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
What is the output of this Word2Vec training snippet?
Consider the following code that trains a Word2Vec model on a small corpus. What will be the shape of the vector for the word 'king' after training?
NLP
from gensim.models import Word2Vec
sentences = [['king', 'queen', 'man', 'woman'], ['apple', 'orange', 'fruit'], ['king', 'man', 'apple']]
model = Word2Vec(sentences, vector_size=50, window=2, min_count=1, epochs=10)
vector = model.wv['king']
print(vector.shape)
A(100,)
B(10,)
C(50,)
D(1, 50)
Attempts:
2 left
💡 Hint
Check the vector_size parameter in the Word2Vec constructor.
Model Choice
intermediate
1:30remaining
Which Word2Vec parameter controls the size of the context window?
You want to train a Word2Vec model and control how many words before and after the target word are considered as context. Which parameter do you set?
Avector_size
Bepochs
Cmin_count
Dwindow
Attempts:
2 left
💡 Hint
Think about the number of words around the target word.
Hyperparameter
advanced
1:30remaining
Which parameter affects ignoring infrequent words during Word2Vec training?
You want to exclude words that appear less than a certain number of times in your corpus to speed up training and reduce noise. Which Word2Vec parameter do you adjust?
Awindow
Bmin_count
Cvector_size
Dsg
Attempts:
2 left
💡 Hint
This parameter filters out rare words.
Metrics
advanced
2:00remaining
Which metric can you use to evaluate the quality of Word2Vec embeddings?
After training a Word2Vec model, you want to check if the embeddings capture semantic similarity well. Which metric or method is commonly used?
ACosine similarity between word vectors
BMean squared error on training loss
CAccuracy on classification labels
DConfusion matrix of predicted words
Attempts:
2 left
💡 Hint
Think about measuring similarity between vectors.
🔧 Debug
expert
2:30remaining
Why does this Word2Vec training code raise a KeyError?
You run this code and get a KeyError when accessing the vector for 'banana'. What is the cause?
NLP
from gensim.models import Word2Vec
sentences = [['apple', 'orange', 'fruit'], ['king', 'queen', 'man', 'woman']]
model = Word2Vec(sentences, vector_size=20, window=2, min_count=2, epochs=5)
vector = model.wv['banana']
A'banana' is not in the vocabulary because it does not appear at least twice (min_count=2)
BThe vector_size is too small to store 'banana' vector
CThe window size is too small to include 'banana'
DThe epochs parameter is too low to train 'banana' vector
Attempts:
2 left
💡 Hint
Check the min_count parameter and the words in the sentences.