Recall & Review
beginner
What is the Bag of Words (BoW) model in text processing?
Bag of Words is a simple way to represent text by counting how many times each word appears, ignoring grammar and order. It turns text into numbers that machines can understand.
Click to reveal answer
beginner
What does TF-IDF stand for and what is its purpose?
TF-IDF stands for Term Frequency-Inverse Document Frequency. It helps find important words in a document by giving higher scores to words that appear often in one document but rarely in others.
Click to reveal answer
intermediate
How does TF (Term Frequency) differ from IDF (Inverse Document Frequency)?
TF counts how often a word appears in one document. IDF measures how rare a word is across all documents. Combining them highlights words that are common in one document but rare overall.
Click to reveal answer
intermediate
Why might Bag of Words be less effective than TF-IDF for some tasks?
Bag of Words treats all words equally, so common words like 'the' get the same importance as meaningful words. TF-IDF reduces the weight of common words, making it better at finding important words.
Click to reveal answer
advanced
What is a limitation of both Bag of Words and TF-IDF models?
Both ignore the order of words, so they miss the meaning that comes from word sequence. For example, 'dog bites man' and 'man bites dog' look the same in these models.
Click to reveal answer
What does the Bag of Words model ignore when representing text?
✗ Incorrect
Bag of Words counts word frequency but ignores the order in which words appear.
In TF-IDF, what does a high IDF score for a word indicate?
✗ Incorrect
High IDF means the word is rare across the set of documents.
Which model gives more importance to rare but meaningful words?
✗ Incorrect
TF-IDF weights rare words higher to highlight their importance.
Which of these is a common use of Bag of Words and TF-IDF?
✗ Incorrect
Both models are used to convert text into numbers for text classification tasks.
What is a shared limitation of Bag of Words and TF-IDF?
✗ Incorrect
Both models ignore the order of words, losing context.
Explain how the Bag of Words model represents text and one advantage and one disadvantage of this approach.
Think about how words are counted and what information is lost.
You got /4 concepts.
Describe what TF-IDF measures and why it can be more useful than just counting word frequency.
Consider how TF-IDF balances common and rare words.
You got /4 concepts.