Bird
0
0

You want to combine Word2Vec training with a custom preprocessing step that removes stopwords and lowercases all words. Which approach correctly integrates this with Gensim's Word2Vec?

hard📝 Application Q9 of 15
NLP - Word Embeddings
You want to combine Word2Vec training with a custom preprocessing step that removes stopwords and lowercases all words. Which approach correctly integrates this with Gensim's Word2Vec?
ATrain Word2Vec first, then remove stopwords from the vocabulary
BPreprocess sentences first, then pass the cleaned token lists to Word2Vec's <code>sentences</code> parameter
CPass raw sentences to Word2Vec and set <code>min_count</code> to remove stopwords
DUse Word2Vec's built-in stopword removal parameter
Step-by-Step Solution
Solution:
  1. Step 1: Understand preprocessing role

    Stopword removal and lowercasing must happen before training to affect vocabulary and vectors.
  2. Step 2: Integrate preprocessing with Word2Vec

    Preprocess the text into cleaned token lists, then pass these lists as sentences to Word2Vec for training.
  3. Final Answer:

    Preprocess sentences first, then pass the cleaned token lists to Word2Vec's sentences parameter -> Option B
  4. Quick Check:

    Preprocess before training = C [OK]
Quick Trick: Clean data before training Word2Vec [OK]
Common Mistakes:
MISTAKES
  • Expecting Word2Vec to remove stopwords automatically
  • Removing stopwords after training
  • Using min_count to remove stopwords incorrectly

Want More Practice?

15+ quiz questions · All difficulty levels · Free

Free Signup - Practice All Questions
More NLP Quizzes