0
0
ML Pythonml~10 mins

Text feature basics (CountVectorizer, TF-IDF) in ML Python - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to create a CountVectorizer instance.

ML Python
from sklearn.feature_extraction.text import [1]
vectorizer = [1]()
Drag options to blanks, or click blank then click option'
ALabelEncoder
BTfidfVectorizer
CCountVectorizer
DDictVectorizer
Attempts:
3 left
💡 Hint
Common Mistakes
Using TfidfVectorizer instead of CountVectorizer
Using LabelEncoder which is for labels, not text features
2fill in blank
medium

Complete the code to transform text data into a count matrix.

ML Python
texts = ['hello world', 'hello machine learning']
count_matrix = vectorizer.[1](texts)
Drag options to blanks, or click blank then click option'
Afit
Bpredict
Ctransform
Dfit_transform
Attempts:
3 left
💡 Hint
Common Mistakes
Using fit only, which does not return the matrix
Using predict, which is not a method here
3fill in blank
hard

Fix the error in the code to create a TF-IDF vectorizer.

ML Python
from sklearn.feature_extraction.text import [1]
tfidf_vectorizer = [1]()
Drag options to blanks, or click blank then click option'
ACountVectorizer
BTfidfVectorizer
CTfidfTransformer
DHashingVectorizer
Attempts:
3 left
💡 Hint
Common Mistakes
Using TfidfTransformer which needs counts first
Using CountVectorizer which only counts words
4fill in blank
hard

Fill both blanks to create a dictionary of word counts for words longer than 3 letters.

ML Python
words = ['data', 'science', 'is', 'fun']
word_counts = {word: [1] for word in words if len(word) [2] 3}
Drag options to blanks, or click blank then click option'
A1
B>
C>=
Dlen(word)
Attempts:
3 left
💡 Hint
Common Mistakes
Using len(word) as count instead of 1
Using >= instead of > causing inclusion of 3-letter words
5fill in blank
hard

Fill all three blanks to create a TF-IDF matrix from text data.

ML Python
from sklearn.feature_extraction.text import [1]
texts = ['machine learning', 'deep learning', 'machine intelligence']
tfidf = [2]()
matrix = tfidf.[3](texts)
Drag options to blanks, or click blank then click option'
ATfidfVectorizer
BTfidfTransformer
Cfit_transform
DCountVectorizer
Attempts:
3 left
💡 Hint
Common Mistakes
Using TfidfTransformer which requires count matrix input
Using CountVectorizer which does not compute TF-IDF