0
0
ML Pythonml~10 mins

Bag of Words and TF-IDF in ML Python - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to create a Bag of Words model using CountVectorizer.

ML Python
from sklearn.feature_extraction.text import CountVectorizer

corpus = ['I love machine learning', 'Machine learning is fun']
vectorizer = CountVectorizer()
X = vectorizer.[1](corpus)
print(X.toarray())
Drag options to blanks, or click blank then click option'
Afit_transform
Btransform
Cfit
Dfit_predict
Attempts:
3 left
💡 Hint
Common Mistakes
Using only fit() which does not transform the data.
Using transform() before fitting the vectorizer.
2fill in blank
medium

Complete the code to create a TF-IDF matrix from the corpus.

ML Python
from sklearn.feature_extraction.text import TfidfVectorizer

corpus = ['Data science is awesome', 'I love data science']
vectorizer = TfidfVectorizer()
X = vectorizer.[1](corpus)
print(X.toarray())
Drag options to blanks, or click blank then click option'
Afit
Bfit_predict
Ctransform
Dfit_transform
Attempts:
3 left
💡 Hint
Common Mistakes
Using fit() only, which does not transform the data.
Using transform() without fitting first.
3fill in blank
hard

Fix the error in the code to correctly get feature names from the CountVectorizer.

ML Python
from sklearn.feature_extraction.text import CountVectorizer

corpus = ['AI is the future', 'Future is AI']
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(corpus)
features = vectorizer.[1]()
print(features)
Drag options to blanks, or click blank then click option'
Afeature_names
Bget_feature_names
Cget_feature_names_out
Dfeatures_names
Attempts:
3 left
💡 Hint
Common Mistakes
Using deprecated method get_feature_names() which may cause warnings.
Trying to access attributes that do not exist.
4fill in blank
hard

Fill both blanks to create a dictionary of word counts for words longer than 3 characters.

ML Python
words = ['data', 'science', 'is', 'fun']
word_counts = {word: [1] for word in words if len(word) [2] 3}
print(word_counts)
Drag options to blanks, or click blank then click option'
Alen(word)
B>
C<
Dword
Attempts:
3 left
💡 Hint
Common Mistakes
Using the word itself as the value instead of its length.
Using the wrong comparison operator.
5fill in blank
hard

Fill all three blanks to create a filtered dictionary of words and their counts where count is greater than 1.

ML Python
word_freq = {'data': 2, 'science': 1, 'fun': 3}
filtered = { [1]: [2] for [3] in word_freq.items() if [2] > 1 }
print(filtered)
Drag options to blanks, or click blank then click option'
Aword
Bcount
Cfor word, count
Ditem
Attempts:
3 left
💡 Hint
Common Mistakes
Using incorrect variable names that don't match the unpacking.
Not filtering correctly by count.