Recall & Review

beginner

What is vocabulary size control in NLP?

Vocabulary size control is the process of limiting or managing the number of unique words or tokens used in a language model to improve efficiency and reduce complexity.

Click to reveal answer

beginner

Why do we need to control vocabulary size in NLP models?

Controlling vocabulary size helps reduce memory use, speeds up training, and avoids rare words that add noise, making models more efficient and generalizable.

Click to reveal answer

intermediate

Name two common methods to control vocabulary size.

1. Limiting vocabulary to the most frequent words. 2. Using subword units like Byte Pair Encoding (BPE) to break rare words into smaller parts.

Click to reveal answer

intermediate

How does Byte Pair Encoding (BPE) help with vocabulary size control?

BPE merges frequent pairs of characters or subwords to create a compact vocabulary that can represent rare words as combinations of smaller units, reducing total vocabulary size.

Click to reveal answer

advanced

What is the trade-off when choosing a smaller vocabulary size?

A smaller vocabulary reduces model size and speeds training but may increase the number of tokens per sentence, potentially making sequences longer and harder to process.

Click to reveal answer

What happens if vocabulary size is too large in an NLP model?

AThe model uses more memory and trains slower

BThe model becomes faster and smaller

CThe model ignores rare words

DThe model always improves accuracy

Which method breaks words into smaller parts to reduce vocabulary size?

AOne-hot encoding

BStop word removal

CLemmatization

DByte Pair Encoding (BPE)

Limiting vocabulary to the most frequent words helps because:

AIt reduces noise and model size

BRare words are always unimportant

CIt increases the number of tokens per sentence

DIt makes the model ignore common words

What is a downside of using a very small vocabulary?

AMore memory usage

BLonger token sequences

CSlower training

DIgnoring frequent words

Vocabulary size control is important because:

AIt always improves model accuracy

BIt removes all rare words

CIt balances model size and performance

DIt makes models ignore punctuation

Explain vocabulary size control and why it matters in NLP models.

Describe two methods to control vocabulary size and their pros and cons.

Practice

(1/5)

1. What is the main purpose of controlling vocabulary size in NLP models?

easy

A. To add more rare words to the dataset

B. To increase the number of training epochs

C. To limit the number of words the model uses

D. To make the model ignore stop words

Vocabulary size control in NLP - Cheat Sheet & Quick Revision

Start learning this pattern below

Practice

Solution

Step 1: Understand vocabulary size control

Step 2: Identify the main goal

Final Answer:

Quick Check:

Solution

Step 1: Recall CountVectorizer parameters

Step 2: Identify parameter for vocabulary size

Final Answer:

Quick Check:

Solution

Step 1: Understand max_features effect

Step 2: Count unique words and frequencies

Final Answer:

Quick Check:

Solution

Step 1: Check max_features type

Step 2: Confirm other parts are correct

Final Answer:

Quick Check:

Solution

Step 1: Understand problem with large vocabulary

Step 2: Choose best vocabulary control method

Final Answer:

Quick Check: