Recall & Review
beginner
What is vocabulary size control in NLP?
Vocabulary size control is the process of limiting or managing the number of unique words or tokens used in a language model to improve efficiency and reduce complexity.
Click to reveal answer
beginner
Why do we need to control vocabulary size in NLP models?
Controlling vocabulary size helps reduce memory use, speeds up training, and avoids rare words that add noise, making models more efficient and generalizable.
Click to reveal answer
intermediate
Name two common methods to control vocabulary size.
1. Limiting vocabulary to the most frequent words. 2. Using subword units like Byte Pair Encoding (BPE) to break rare words into smaller parts.
Click to reveal answer
intermediate
How does Byte Pair Encoding (BPE) help with vocabulary size control?
BPE merges frequent pairs of characters or subwords to create a compact vocabulary that can represent rare words as combinations of smaller units, reducing total vocabulary size.
Click to reveal answer
advanced
What is the trade-off when choosing a smaller vocabulary size?
A smaller vocabulary reduces model size and speeds training but may increase the number of tokens per sentence, potentially making sequences longer and harder to process.
Click to reveal answer
What happens if vocabulary size is too large in an NLP model?
✗ Incorrect
A large vocabulary increases memory use and slows training because the model must handle many unique tokens.
Which method breaks words into smaller parts to reduce vocabulary size?
✗ Incorrect
BPE splits rare words into subword units, reducing the total vocabulary needed.
Limiting vocabulary to the most frequent words helps because:
✗ Incorrect
Focusing on frequent words reduces noise from rare words and keeps the vocabulary manageable.
What is a downside of using a very small vocabulary?
✗ Incorrect
Smaller vocabularies often mean words are split into many tokens, making sequences longer.
Vocabulary size control is important because:
✗ Incorrect
Controlling vocabulary size helps balance efficiency and model quality.
Explain vocabulary size control and why it matters in NLP models.
Think about how many words a model knows and how that affects speed and memory.
You got /3 concepts.
Describe two methods to control vocabulary size and their pros and cons.
Consider how each method handles rare words and vocabulary size.
You got /3 concepts.