Practice - 5 Tasks
Answer the questions below
1fill in blank
easyComplete the code to limit the vocabulary size when tokenizing text.
NLP
from sklearn.feature_extraction.text import CountVectorizer vectorizer = CountVectorizer(max_features=[1]) texts = ['I love machine learning', 'AI is the future'] X = vectorizer.fit_transform(texts)
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using None or negative numbers which do not limit vocabulary size.
Setting max_features to 0 which disables vocabulary.
✗ Incorrect
Setting max_features to 1000 limits the vocabulary to the top 1000 words by frequency.
2fill in blank
mediumComplete the code to remove rare words by setting a minimum document frequency.
NLP
from sklearn.feature_extraction.text import CountVectorizer vectorizer = CountVectorizer(min_df=[1]) texts = ['apple banana apple', 'banana orange', 'apple orange orange'] X = vectorizer.fit_transform(texts)
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using integer values without understanding they mean absolute counts.
Setting min_df to 0 which keeps all words.
✗ Incorrect
min_df=0.5 means words must appear in at least 50% of documents to be kept.
3fill in blank
hardFix the error in the code to correctly limit vocabulary size and remove rare words.
NLP
from sklearn.feature_extraction.text import CountVectorizer vectorizer = CountVectorizer(max_features=[1], min_df=[2]) texts = ['cat dog cat', 'dog mouse', 'cat mouse mouse'] X = vectorizer.fit_transform(texts)
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Setting min_df too high removes all words.
Using invalid types for max_features or min_df.
✗ Incorrect
max_features=2 limits vocabulary to top 2 words; min_df=1 means words must appear in at least 1 document.
4fill in blank
hardFill both blanks to create a vocabulary of words appearing in at least 2 documents and limit size to 3.
NLP
from sklearn.feature_extraction.text import CountVectorizer vectorizer = CountVectorizer(min_df=[1], max_features=[2]) texts = ['sun moon sun', 'moon star', 'sun star star'] X = vectorizer.fit_transform(texts)
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Swapping min_df and max_features values.
Using values that remove all words.
✗ Incorrect
min_df=2 keeps words in at least 2 documents; max_features=3 limits vocabulary size to 3.
5fill in blank
hardFill all three blanks to create a vocabulary with max size 4, min document frequency 2, and max document frequency 0.8.
NLP
from sklearn.feature_extraction.text import CountVectorizer vectorizer = CountVectorizer(max_features=[1], min_df=[2], max_df=[3]) texts = ['apple banana apple', 'banana orange apple', 'apple orange banana', 'banana apple orange'] X = vectorizer.fit_transform(texts)
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using max_df greater than 1 or less than 0.
Setting min_df higher than max_df causes empty vocabulary.
✗ Incorrect
max_features=4 limits vocabulary size; min_df=2 keeps words in at least 2 documents; max_df=0.8 removes very common words appearing in more than 80% of documents.