0
0
NLPml~10 mins

Vocabulary size control in NLP - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to limit the vocabulary size when tokenizing text.

NLP
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(max_features=[1])
texts = ['I love machine learning', 'AI is the future']
X = vectorizer.fit_transform(texts)
Drag options to blanks, or click blank then click option'
A1000
B0
CNone
D-1
Attempts:
3 left
💡 Hint
Common Mistakes
Using None or negative numbers which do not limit vocabulary size.
Setting max_features to 0 which disables vocabulary.
2fill in blank
medium

Complete the code to remove rare words by setting a minimum document frequency.

NLP
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(min_df=[1])
texts = ['apple banana apple', 'banana orange', 'apple orange orange']
X = vectorizer.fit_transform(texts)
Drag options to blanks, or click blank then click option'
A2
B0.5
C1
D0
Attempts:
3 left
💡 Hint
Common Mistakes
Using integer values without understanding they mean absolute counts.
Setting min_df to 0 which keeps all words.
3fill in blank
hard

Fix the error in the code to correctly limit vocabulary size and remove rare words.

NLP
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(max_features=[1], min_df=[2])
texts = ['cat dog cat', 'dog mouse', 'cat mouse mouse']
X = vectorizer.fit_transform(texts)
Drag options to blanks, or click blank then click option'
A2
B0.5
C1
D3
Attempts:
3 left
💡 Hint
Common Mistakes
Setting min_df too high removes all words.
Using invalid types for max_features or min_df.
4fill in blank
hard

Fill both blanks to create a vocabulary of words appearing in at least 2 documents and limit size to 3.

NLP
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(min_df=[1], max_features=[2])
texts = ['sun moon sun', 'moon star', 'sun star star']
X = vectorizer.fit_transform(texts)
Drag options to blanks, or click blank then click option'
A2
B3
C1
D4
Attempts:
3 left
💡 Hint
Common Mistakes
Swapping min_df and max_features values.
Using values that remove all words.
5fill in blank
hard

Fill all three blanks to create a vocabulary with max size 4, min document frequency 2, and max document frequency 0.8.

NLP
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(max_features=[1], min_df=[2], max_df=[3])
texts = ['apple banana apple', 'banana orange apple', 'apple orange banana', 'banana apple orange']
X = vectorizer.fit_transform(texts)
Drag options to blanks, or click blank then click option'
A4
B2
C0.8
D3
Attempts:
3 left
💡 Hint
Common Mistakes
Using max_df greater than 1 or less than 0.
Setting min_df higher than max_df causes empty vocabulary.