0

NLPml~20 mins

Bag of Words (CountVectorizer) in NLP - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

or

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Challenge - 5 Problems

🎖️

Bag of Words Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of CountVectorizer on simple text

What is the output of the following code snippet using CountVectorizer from scikit-learn?

NLP

from sklearn.feature_extraction.text import CountVectorizer
corpus = ['apple banana apple', 'banana orange']
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(corpus)
result = X.toarray()
vocab = vectorizer.get_feature_names_out()
print(vocab)
print(result)

A['apple' 'orange' 'banana']\n[[2 0 1]\n [0 1 1]]

B['banana' 'apple' 'orange']\n[[1 2 0]\n [1 0 1]]

C['apple' 'banana' 'orange']\n[[2 1 0]\n [0 1 1]]

D['apple' 'banana' 'orange']\n[[1 2 0]\n [0 1 1]]

Attempts:

2 left

🧠 Conceptual

intermediate

1:30remaining

Understanding vocabulary size in CountVectorizer

Given the corpus: ['cat dog', 'dog mouse', 'cat mouse dog'], what is the vocabulary size created by CountVectorizer with default settings?

A3

B5

C4

D2

Attempts:

2 left

❓ Hyperparameter

advanced

2:00remaining

Effect of stop_words parameter in CountVectorizer

What will be the vocabulary output of CountVectorizer when applied to ['the cat sat', 'the dog barked'] with stop_words='english'?

A['the', 'cat', 'sat', 'dog', 'barked']

B['barked', 'cat', 'dog', 'sat']

C['cat', 'dog']

D['the']

Attempts:

2 left

❓ Metrics

advanced

1:30remaining

Calculating document frequency with CountVectorizer

Using CountVectorizer on ['apple apple banana', 'banana orange', 'apple orange orange'], what is the document frequency (number of documents containing the word) for 'apple'?

A2

B3

C1

D0

Attempts:

2 left

🔧 Debug

expert

2:00remaining

Identifying error in CountVectorizer usage

What error will the following code raise? from sklearn.feature_extraction.text import CountVectorizer corpus = ['hello world', 123, 'hello'] vectorizer = CountVectorizer() X = vectorizer.fit_transform(corpus)

AAttributeError: 'int' object has no attribute 'lower'

BValueError: empty vocabulary; perhaps the documents only contain stop words

CNo error, code runs successfully

DTypeError: expected string or bytes-like object

Attempts:

2 left

Practice

(1/5)

1. What does the Bag of Words model do in text processing?

easy

A. Counts how often each word appears in the text

B. Translates text into another language

C. Removes all punctuation from the text

D. Generates summaries of the text

Bag of Words (CountVectorizer) in NLP - Practice Problems & Coding Challenges

Start learning this pattern below

Practice

Solution

Step 1: Understand Bag of Words purpose

Step 2: Compare options to definition

Final Answer:

Quick Check:

Solution

Step 1: Recall correct import path

Step 2: Match options to correct syntax

Final Answer:

Quick Check:

Solution

Step 1: Identify unique words

Step 2: Count sentences and features

Final Answer:

Quick Check:

Solution

Step 1: Identify deprecated method

Step 2: Use correct method

Final Answer:

Quick Check:

Solution

Step 1: Understand max_df parameter

Step 2: Compare other options

Final Answer:

Quick Check: