Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is Word2Vec?
Word2Vec is a method to turn words into numbers (vectors) so that computers can understand the meaning of words based on their context in sentences.
Click to reveal answer
beginner
What are the two main architectures of Word2Vec?
The two main architectures are CBOW (Continuous Bag of Words) which predicts a word from its neighbors, and Skip-gram which predicts neighbors from a word.
Click to reveal answer
intermediate
In Gensim, how do you start training a Word2Vec model on a list of sentences?
You create a Word2Vec object with parameters like vector_size and window, then call the .build_vocab() method with your sentences, and finally call .train() to train the model.
Click to reveal answer
beginner
What does the 'window' parameter control in Word2Vec training?
The 'window' parameter controls how many words before and after the target word are considered as context during training.
Click to reveal answer
intermediate
How can you check the similarity between two words using a trained Word2Vec model in Gensim?
Use the model's .wv.similarity('word1', 'word2') method to get a score showing how similar the two words are based on their vectors.
Click to reveal answer
Which Gensim method is used to prepare the vocabulary before training Word2Vec?
Afit()
Btrain()
Cbuild_vocab()
Dinit_vocab()
✗ Incorrect
The build_vocab() method scans the sentences to build the vocabulary needed for training.
What does the 'vector_size' parameter specify in Word2Vec?
ANumber of words in the vocabulary
BSize of the training batch
CNumber of training epochs
DLength of the word vectors
✗ Incorrect
vector_size sets how many numbers each word vector will have, defining its length.
Which Word2Vec architecture predicts the center word from surrounding words?
ACBOW
BRNN
CSkip-gram
DTransformer
✗ Incorrect
CBOW (Continuous Bag of Words) predicts the center word from its context words.
How do you save a trained Word2Vec model in Gensim?
Amodel.export('filename')
Bmodel.save('filename')
Cmodel.write('filename')
Dmodel.store('filename')
✗ Incorrect
The save() method stores the model to disk for later use.
What type of data does Word2Vec expect for training?
AList of sentences, each sentence is a list of words
BSingle long string of text
CDictionary of word counts
DList of word vectors
✗ Incorrect
Word2Vec trains on a list of sentences, where each sentence is a list of words (tokens).
Explain how to train a Word2Vec model using Gensim starting from raw text data.
Think about the steps from raw text to a trained model.
You got /5 concepts.
Describe the difference between CBOW and Skip-gram architectures in Word2Vec.
Focus on what each architecture tries to predict.
You got /3 concepts.
Practice
(1/5)
1. What is the main purpose of training a Word2Vec model using Gensim?
easy
A. To count the frequency of words in a text
B. To translate text from one language to another
C. To convert words into meaningful number vectors
D. To remove stop words from a text
Solution
Step 1: Understand Word2Vec's goal
Word2Vec creates number vectors that capture word meanings and relationships.
Step 2: Identify Gensim's role
Gensim provides tools to train Word2Vec models easily on text data.
Final Answer:
To convert words into meaningful number vectors -> Option C
Quick Check:
Word2Vec = word vectors [OK]
Hint: Word2Vec = words to numbers with meaning [OK]
Common Mistakes:
Confusing Word2Vec with word counting
Thinking Word2Vec translates languages
Assuming Word2Vec removes stop words
2. Which of the following is the correct way to import the Word2Vec class from Gensim?
A. A 10-dimensional numpy array representing 'king'
B. The string 'king'
C. A list of words similar to 'king'
D. An error because 'king' is not in vocabulary
Solution
Step 1: Understand model.wv['word'] output
Accessing model.wv['king'] returns the vector (array) for 'king'.
Step 2: Check training and vocabulary
'king' is in sentences and min_count=1, so it's in vocabulary and has a vector of size 10.
Final Answer:
A 10-dimensional numpy array representing 'king' -> Option A
Quick Check:
model.wv['word'] = vector array [OK]
Hint: model.wv[word] returns vector array [OK]
Common Mistakes:
Expecting a string instead of vector
Confusing with similar words list
Assuming 'king' is missing from vocabulary
4. What is wrong with this code snippet for training Word2Vec?
from gensim.models import Word2Vec
sentences = [['cat', 'dog'], ['mouse', 'rat']]
model = Word2Vec(sentences, size=50, window=3, min_count=1)
model.train(sentences, total_examples=2, epochs=10)
medium
A. min_count must be greater than 1
B. 'train' method is missing required arguments
C. Sentences should be a flat list, not list of lists
D. The parameter 'size' is deprecated; use 'vector_size' instead
Solution
Step 1: Check Word2Vec parameters
Recent Gensim versions use 'vector_size' instead of 'size' for vector dimension.
Step 2: Verify other code parts
'train' method usage and sentences format are correct; min_count=1 is valid.
Final Answer:
The parameter 'size' is deprecated; use 'vector_size' instead -> Option D
Quick Check:
Use 'vector_size' not 'size' [OK]
Hint: Use 'vector_size' for dimensions in Gensim 4+ [OK]
Common Mistakes:
Using old 'size' parameter causes warnings or errors
Thinking sentences must be flat list
Believing min_count must be >1
5. You want to train a Word2Vec model on a large text corpus but notice the training is very slow. Which combination of changes can speed up training without losing much quality?
Reduce vector_size from 300 to 100
Increase window size from 5 to 10
Set min_count to 5 instead of 1
Decrease epochs from 10 to 3
hard
A. Apply changes 2 and 4 only
B. Apply changes 1, 3, and 4
C. Apply changes 1 and 3 only
D. Apply all changes 1, 2, 3, and 4
Solution
Step 1: Analyze each change's effect on speed and quality
Reducing vector_size (1) speeds training with slight quality loss. Increasing window (2) slows training and may reduce quality. Increasing min_count (3) removes rare words, speeding training. Decreasing epochs (4) reduces training time but may reduce quality.
Step 2: Choose changes that speed up without much quality loss
Changes 1, 3, and 4 speed training; 2 increases window and slows it, so exclude 2.
Final Answer:
Apply changes 1, 3, and 4 -> Option B
Quick Check:
Reduce size, min_count, epochs = faster training [OK]
Hint: Lower vector_size, min_count, epochs to speed up [OK]