0
0
NLPml~5 mins

Handling out-of-vocabulary words in NLP - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What are out-of-vocabulary (OOV) words in NLP?
OOV words are words that a model has never seen during training. They are new or rare words that do not appear in the model's vocabulary.
Click to reveal answer
beginner
Why is handling OOV words important in NLP?
Handling OOV words is important because models need to understand or process new words to work well on real-world text, which often contains unseen words.
Click to reveal answer
beginner
Name one simple method to handle OOV words.
One simple method is to replace OOV words with a special token like <UNK> (unknown), so the model treats all unknown words the same way.
Click to reveal answer
intermediate
How do subword tokenization methods help with OOV words?
Subword tokenization breaks words into smaller parts (like syllables or character groups), so even if a full word is new, its parts may be known, helping the model understand it.
Click to reveal answer
intermediate
What is the role of character-level models in handling OOV words?
Character-level models read words letter by letter, so they can build meaning from any word, even if it was never seen before, reducing OOV problems.
Click to reveal answer
What does the <UNK> token represent in NLP?
AA common stop word
BUnknown or out-of-vocabulary words
CA punctuation mark
DA named entity
Which method breaks words into smaller known pieces to handle OOV words?
ALemmatization
BStop word removal
CSubword tokenization
DPart-of-speech tagging
Why might character-level models reduce OOV issues?
AThey process words letter by letter
BThey use word frequency
CThey ignore word order
DThey remove punctuation
What is a downside of replacing OOV words with <UNK> token?
AModel treats all unknown words the same, losing specific meaning
BIt increases vocabulary size
CIt slows down training
DIt requires labeled data
Which of these is NOT a common way to handle OOV words?
AUsing character-level models
BReplacing with <UNK> token
CUsing subword tokenization
DIgnoring OOV words completely
Explain what out-of-vocabulary words are and why they pose a challenge in NLP.
Think about words a model never saw during training.
You got /3 concepts.
    Describe at least two methods to handle out-of-vocabulary words and how they help.
    Consider simple replacement and breaking words into parts.
    You got /4 concepts.