ML Pythonml~20 mins

Text feature basics (CountVectorizer, TF-IDF) in ML Python - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Challenge - 5 Problems

🎖️

Text Feature Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of CountVectorizer on simple text

What is the output of the following code snippet using CountVectorizer?

ML Python

from sklearn.feature_extraction.text import CountVectorizer
corpus = ['apple orange apple', 'orange banana orange']
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(corpus)
result = X.toarray()
feature_names = vectorizer.get_feature_names_out()
print(feature_names)
print(result)

A['apple' 'banana' 'orange']\n[[2 0 1]\n [0 1 2]]

B['apple' 'banana' 'orange']\n[[1 0 2]\n [0 2 1]]

C['apple' 'banana' 'orange']\n[[2 1 0]\n [1 0 2]]

D['banana' 'apple' 'orange']\n[[2 0 1]\n [0 1 2]]

Attempts:

2 left

🧠 Conceptual

intermediate

1:30remaining

Understanding TF-IDF importance

Which statement best describes why TF-IDF is useful compared to simple word counts?

ATF-IDF counts the total number of words in a document without weighting.

BTF-IDF reduces the weight of common words and highlights rare but important words.

CTF-IDF only counts words that appear in all documents equally.

DTF-IDF ignores word frequency and only uses document length.

Attempts:

2 left

❓ Metrics

advanced

2:00remaining

Comparing vector lengths from CountVectorizer and TF-IDF

Given the same text corpus, which statement about the vector lengths produced by CountVectorizer and TfidfVectorizer is true?

AVectors from TfidfVectorizer usually have smaller values but the same length as CountVectorizer vectors.

BVectors from CountVectorizer always have larger length because they count words multiple times.

CVectors from TfidfVectorizer are always longer because they add extra features.

DVectors from CountVectorizer and TfidfVectorizer have different lengths because they use different vocabularies.

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identifying error in TF-IDF code snippet

What error will this code raise and why? from sklearn.feature_extraction.text import TfidfVectorizer corpus = ['cat dog', 'dog mouse'] vectorizer = TfidfVectorizer(stop_words='english') X = vectorizer.fit_transform(corpus) print(vectorizer.get_feature_names_out())

AValueError because 'english' stop words remove all words leaving empty vocabulary

BAttributeError because get_feature_names_out() does not exist

CTypeError because stop_words must be a list, not a string

DNo error; output is ['cat' 'dog' 'mouse']

Attempts:

2 left

❓ Model Choice

expert

2:30remaining

Choosing the best vectorizer for short text classification

You want to classify very short text messages (like tweets) where common words appear frequently but are not useful. Which vectorizer choice is best and why?

AUse CountVectorizer with max_features=10 to limit vocabulary size.

BUse CountVectorizer without stop words because raw counts capture all info.

CUse TfidfVectorizer with English stop words to reduce common word impact and highlight rare words.

DUse TfidfVectorizer without stop words to keep all words weighted equally.

Attempts:

2 left

Practice

(1/5)

1. What does CountVectorizer do in text processing?

easy

A. Calculates the importance of words based on frequency and rarity

B. Counts how many times each word appears in the text

C. Removes stop words from the text

D. Converts text into lowercase only

Text feature basics (CountVectorizer, TF-IDF) in ML Python - Practice Problems & Coding Challenges

Start learning this pattern below

Practice

Solution

Step 1: Understand CountVectorizer's role

Step 2: Differentiate from TF-IDF

Final Answer:

Quick Check:

Solution

Step 1: Recall correct sklearn import path

Step 2: Check syntax correctness

Final Answer:

Quick Check:

Solution

Step 1: Count unique words in sentences

Step 2: Understand shape of output matrix

Final Answer:

Quick Check:

Solution

Step 1: Check method usage for feature names

Step 2: Use updated method

Final Answer:

Quick Check:

Solution

Step 1: Understand the goal of reducing common word impact

Step 2: Identify method that weighs words by importance

Final Answer:

Quick Check: