Practice - 5 Tasks
Answer the questions below
1fill in blank
easyComplete the code to create a CountVectorizer instance.
ML Python
from sklearn.feature_extraction.text import [1] vectorizer = [1]()
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using TfidfVectorizer instead of CountVectorizer
Using LabelEncoder which is for labels, not text features
✗ Incorrect
The CountVectorizer converts text into a matrix of token counts.
2fill in blank
mediumComplete the code to transform text data into a count matrix.
ML Python
texts = ['hello world', 'hello machine learning'] count_matrix = vectorizer.[1](texts)
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using fit only, which does not return the matrix
Using predict, which is not a method here
✗ Incorrect
fit_transform learns the vocabulary and returns the count matrix in one step.
3fill in blank
hardFix the error in the code to create a TF-IDF vectorizer.
ML Python
from sklearn.feature_extraction.text import [1] tfidf_vectorizer = [1]()
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using TfidfTransformer which needs counts first
Using CountVectorizer which only counts words
✗ Incorrect
TfidfVectorizer directly converts text to TF-IDF features.
4fill in blank
hardFill both blanks to create a dictionary of word counts for words longer than 3 letters.
ML Python
words = ['data', 'science', 'is', 'fun'] word_counts = {word: [1] for word in words if len(word) [2] 3}
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using len(word) as count instead of 1
Using >= instead of > causing inclusion of 3-letter words
✗ Incorrect
We assign count 1 for each word longer than 3 letters using len(word) > 3.
5fill in blank
hardFill all three blanks to create a TF-IDF matrix from text data.
ML Python
from sklearn.feature_extraction.text import [1] texts = ['machine learning', 'deep learning', 'machine intelligence'] tfidf = [2]() matrix = tfidf.[3](texts)
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using TfidfTransformer which requires count matrix input
Using CountVectorizer which does not compute TF-IDF
✗ Incorrect
Use TfidfVectorizer to create the vectorizer and call fit_transform to get the TF-IDF matrix.