Complete the code to create a Bag of Words model using CountVectorizer.
from sklearn.feature_extraction.text import CountVectorizer corpus = ['I love machine learning', 'Machine learning is fun'] vectorizer = CountVectorizer() X = vectorizer.[1](corpus) print(X.toarray())
The fit_transform method learns the vocabulary and transforms the text data into a Bag of Words matrix in one step.
Complete the code to create a TF-IDF matrix from the corpus.
from sklearn.feature_extraction.text import TfidfVectorizer corpus = ['Data science is awesome', 'I love data science'] vectorizer = TfidfVectorizer() X = vectorizer.[1](corpus) print(X.toarray())
The fit_transform method fits the TF-IDF vectorizer to the corpus and transforms the text into TF-IDF features.
Fix the error in the code to correctly get feature names from the CountVectorizer.
from sklearn.feature_extraction.text import CountVectorizer corpus = ['AI is the future', 'Future is AI'] vectorizer = CountVectorizer() X = vectorizer.fit_transform(corpus) features = vectorizer.[1]() print(features)
The method get_feature_names_out() is the correct and current way to get feature names from a fitted CountVectorizer.
Fill both blanks to create a dictionary of word counts for words longer than 3 characters.
words = ['data', 'science', 'is', 'fun'] word_counts = {word: [1] for word in words if len(word) [2] 3} print(word_counts)
The dictionary comprehension counts the length of each word and includes only words longer than 3 characters.
Fill all three blanks to create a filtered dictionary of words and their counts where count is greater than 1.
word_freq = {'data': 2, 'science': 1, 'fun': 3}
filtered = { [1]: [2] for [3] in word_freq.items() if [2] > 1 }
print(filtered)The dictionary comprehension iterates over word-frequency pairs and keeps only those with count greater than 1.