Bird
0
0

When training a Word2Vec model on a large corpus with many rare words, which configuration helps improve the quality of embeddings for those rare words?

hard📝 Application Q8 of 15
NLP - Word Embeddings
When training a Word2Vec model on a large corpus with many rare words, which configuration helps improve the quality of embeddings for those rare words?
AUse Skip-gram with a large window size and high vector_size but exclude rare words.
BUse CBOW (sg=0) with a high min_count value to focus on frequent words only.
CUse Skip-gram (sg=1) with a low min_count value to include rare words in training.
DUse CBOW with negative sampling disabled to better learn rare word embeddings.
Step-by-Step Solution
Solution:
  1. Step 1: Understand Skip-gram vs CBOW

    Skip-gram performs better on rare words by predicting context from center word.
  2. Step 2: Role of min_count

    Lowering min_count includes more rare words in the vocabulary for training.
  3. Step 3: Combine settings

    Using Skip-gram with low min_count helps capture rare word semantics effectively.
  4. Final Answer:

    Use Skip-gram (sg=1) with a low min_count value to include rare words in training. -> Option C
  5. Quick Check:

    Skip-gram + low min_count = better rare word embeddings [OK]
Quick Trick: Skip-gram + low min_count captures rare words better [OK]
Common Mistakes:
MISTAKES
  • Assuming CBOW is better for rare words
  • Setting min_count too high excludes rare words
  • Disabling negative sampling harms training quality

Want More Practice?

15+ quiz questions · All difficulty levels · Free

Free Signup - Practice All Questions
More NLP Quizzes