0
0
NLPml~10 mins

Bag of Words (CountVectorizer) in NLP - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to import the CountVectorizer from sklearn.

NLP
from sklearn.feature_extraction.text import [1]
Drag options to blanks, or click blank then click option'
ACountVectorizer
BTfidfVectorizer
CDictVectorizer
DFeatureHasher
Attempts:
3 left
💡 Hint
Common Mistakes
Importing TfidfVectorizer instead of CountVectorizer.
Using DictVectorizer which is for dictionaries, not text.
Trying to import from the wrong sklearn module.
2fill in blank
medium

Complete the code to create a CountVectorizer instance.

NLP
vectorizer = [1]()
Drag options to blanks, or click blank then click option'
AStandardScaler
BCountVectorizer
CTfidfVectorizer
DLabelEncoder
Attempts:
3 left
💡 Hint
Common Mistakes
Using TfidfVectorizer which computes TF-IDF instead of counts.
Using StandardScaler which is for numeric data scaling.
Using LabelEncoder which is for categorical labels.
3fill in blank
hard

Fix the error in the code to transform the text data into count vectors.

NLP
X = vectorizer.[1](documents)
Drag options to blanks, or click blank then click option'
Atransform
Btransform_fit
Cfit
Dfit_transform
Attempts:
3 left
💡 Hint
Common Mistakes
Using transform_fit which does not exist.
Using fit alone which does not transform data.
Using transform alone without fitting first.
4fill in blank
hard

Fill in the blank to create a dictionary of word indices for words longer than 3 characters.

NLP
word_counts = {word: vectorizer.vocabulary_.get(word, 0) for word in vectorizer.get_feature_names_out() if len(word) [1] 3}
Drag options to blanks, or click blank then click option'
A==
B<
C>
D!=
Attempts:
3 left
💡 Hint
Common Mistakes
Using '<' which selects shorter words.
Using '==' which selects words of length exactly 3.
Using '!=' which selects words not equal to length 3.
5fill in blank
hard

Fill all three blanks to print the feature names, shape of X, and the count matrix as an array.

NLP
print('Features:', vectorizer.[1]())
print('Shape:', X.[2])
print('Array:\n', X.[3]())
Drag options to blanks, or click blank then click option'
Aget_feature_names_out
Bshape
Ctoarray
Dtodense
Attempts:
3 left
💡 Hint
Common Mistakes
Using todense instead of toarray which returns a matrix, not array.
Using shape as a method instead of attribute.
Using get_feature_names which is deprecated.