Recall & Review
beginner
What are out-of-vocabulary (OOV) words in NLP?
OOV words are words that a model has never seen during training. They are new or rare words that do not appear in the model's vocabulary.
Click to reveal answer
beginner
Why is handling OOV words important in NLP?
Handling OOV words is important because models need to understand or process new words to work well on real-world text, which often contains unseen words.
Click to reveal answer
beginner
Name one simple method to handle OOV words.
One simple method is to replace OOV words with a special token like <UNK> (unknown), so the model treats all unknown words the same way.
Click to reveal answer
intermediate
How do subword tokenization methods help with OOV words?
Subword tokenization breaks words into smaller parts (like syllables or character groups), so even if a full word is new, its parts may be known, helping the model understand it.
Click to reveal answer
intermediate
What is the role of character-level models in handling OOV words?
Character-level models read words letter by letter, so they can build meaning from any word, even if it was never seen before, reducing OOV problems.
Click to reveal answer
What does the <UNK> token represent in NLP?
✗ Incorrect
The token is used to represent words that are not in the model's vocabulary, i.e., unknown or out-of-vocabulary words.
Which method breaks words into smaller known pieces to handle OOV words?
✗ Incorrect
Subword tokenization splits words into smaller parts, helping models understand new words by their known pieces.
Why might character-level models reduce OOV issues?
✗ Incorrect
Character-level models read words one letter at a time, allowing them to handle any word, even unseen ones.
What is a downside of replacing OOV words with <UNK> token?
✗ Incorrect
Using token means the model cannot distinguish between different unknown words, losing their unique meanings.
Which of these is NOT a common way to handle OOV words?
✗ Incorrect
Ignoring OOV words completely is not effective because it loses information; other methods help the model understand or represent them.
Explain what out-of-vocabulary words are and why they pose a challenge in NLP.
Think about words a model never saw during training.
You got /3 concepts.
Describe at least two methods to handle out-of-vocabulary words and how they help.
Consider simple replacement and breaking words into parts.
You got /4 concepts.