Recall & Review
beginner
What is the Bag of Words model in text processing?
It is a way to represent text by counting how many times each word appears, ignoring grammar and word order.
Click to reveal answer
beginner
What does CountVectorizer do in NLP?
CountVectorizer converts a collection of text documents into a matrix of word counts, showing how often each word appears in each document.
Click to reveal answer
beginner
Why does Bag of Words ignore word order?
Because it focuses only on the frequency of words, not their position, to simplify text into numbers for machine learning.
Click to reveal answer
intermediate
How does CountVectorizer handle new words not seen during training?
It ignores words not in its vocabulary, so new words in test data are not counted in the output matrix.
Click to reveal answer
intermediate
What is a limitation of the Bag of Words model?
It loses the meaning from word order and context, so it can’t understand phrases or sentence structure.
Click to reveal answer
What does CountVectorizer output for a set of text documents?
✗ Incorrect
CountVectorizer creates a matrix showing how many times each word appears in each document.
Which aspect does Bag of Words ignore?
✗ Incorrect
Bag of Words counts words but does not consider the order they appear in.
If a new word appears in test data but not in training, what happens in CountVectorizer?
✗ Incorrect
CountVectorizer ignores words not in its learned vocabulary.
Why is Bag of Words useful for machine learning?
✗ Incorrect
Machine learning models need numbers, and Bag of Words turns text into numeric counts.
Which is a common problem with Bag of Words?
✗ Incorrect
Bag of Words does not keep the order or meaning of words, only counts.
Explain how CountVectorizer transforms text data into a format usable by machine learning models.
Think about how text is turned into numbers by counting words.
You got /4 concepts.
Describe one main limitation of the Bag of Words model and why it matters.
Consider what information is lost when only counting words.
You got /4 concepts.