Practice

(1/5)

1. Which of the following is a main limitation of classical NLP methods like bag-of-words?

easy

A. They ignore the order and context of words in a sentence.

B. They require very large datasets to work.

C. They always need deep neural networks to function.

D. They can understand sarcasm and irony easily.

Solution

Step 1: Understand classical NLP methods
Classical methods like bag-of-words treat text as a collection of words without order or context.
Step 2: Identify the limitation
This means they cannot capture meaning that depends on word order or surrounding words.
Final Answer:
They ignore the order and context of words in a sentence. -> Option A
Quick Check:
Classical methods miss context = C [OK]

Hint: Remember bag-of-words loses word order and context [OK]

Common Mistakes:

Thinking classical methods need big data
Believing classical methods use deep learning
Assuming classical methods understand sarcasm

2. Which syntax correctly represents a classical method feature extraction for text using bag-of-words in Python?

easy

A. import spacy nlp = spacy.load('en_core_web_sm') doc = nlp(text)

B. import tensorflow as tf model = tf.keras.Sequential()

C. from nltk.tokenize import word_tokenize words = word_tokenize(text)

D. from sklearn.feature_extraction.text import CountVectorizer vectorizer = CountVectorizer() X = vectorizer.fit_transform(texts)

Solution

Step 1: Identify classical method for feature extraction
Bag-of-words uses CountVectorizer from sklearn to convert text to word counts.
Step 2: Match syntax to bag-of-words
from sklearn.feature_extraction.text import CountVectorizer vectorizer = CountVectorizer() X = vectorizer.fit_transform(texts) shows correct import and usage of CountVectorizer for feature extraction.
Final Answer:
from sklearn.feature_extraction.text import CountVectorizer vectorizer = CountVectorizer() X = vectorizer.fit_transform(texts) -> Option D
Quick Check:
CountVectorizer syntax = A [OK]

Hint: CountVectorizer is sklearn's bag-of-words tool [OK]

Common Mistakes:

Confusing tokenization with feature extraction
Using deep learning imports for classical methods
Mixing spaCy usage with bag-of-words

3. Given this code using bag-of-words, what is the shape of the output matrix X if texts = ['I love AI', 'love AI']?

medium

A. (2, 4)

B. (3, 2)

C. (2, 3)

D. (4, 2)

Solution

Step 1: Count unique words in texts
Texts are ['I love AI', 'love AI']. Lowercased tokens: 'i love ai', 'love ai'. Unique tokens: 'ai', 'i', 'love' = 3 words.
Step 2: Check CountVectorizer default behavior
CountVectorizer lowercases and tokenizes. Number of samples is 2. So shape is (2, 3).
Final Answer:
(2, 3) -> Option C
Quick Check:
2 samples, 3 features = B [OK]

Hint: Count unique words for shape: rows=samples, cols=unique words [OK]

Common Mistakes:

Counting words instead of unique tokens
Mixing rows and columns in shape
Ignoring case sensitivity

4. Identify the error in this classical NLP code snippet using CountVectorizer:

from sklearn.feature_extraction.text import CountVectorizer
texts = ['Hello world', 'Hello']
vectorizer = CountVectorizer()
X = vectorizer.fit(texts)
print(X.toarray())

medium

A. fit() should be fit_transform() to get the matrix.

B. CountVectorizer cannot process lists of strings.

C. toarray() is not a method of the output.

D. Missing import for numpy.

Solution

Step 1: Check CountVectorizer usage
fit() learns the vocabulary but does not transform texts to matrix. fit_transform() does both.
Step 2: Identify correct method to get matrix
To get the document-term matrix, fit_transform() must be used. Using fit() alone returns the vectorizer object, which has no toarray() method.
Final Answer:
fit() should be fit_transform() to get the matrix. -> Option A
Quick Check:
fit_transform() needed for matrix [OK]

Hint: Use fit_transform() to get matrix, not just fit() [OK]

Common Mistakes:

Using fit() instead of fit_transform()
Assuming toarray() works on vectorizer
Thinking CountVectorizer needs numpy import

5. Why might classical NLP methods like bag-of-words fail on sentiment analysis of complex sentences such as 'I don't think this movie was good'?

hard

A. They cannot tokenize contractions like "don't".

B. They treat words independently and miss negation and word order.

C. They always overfit on small datasets.

D. They require GPU acceleration to process negations.

Solution

Step 1: Understand classical method limitations
Bag-of-words treats each word separately, ignoring order and context.
Step 2: Analyze sentence complexity
Sentence has negation "don't" which flips sentiment. Without context, model may misinterpret sentiment.
Step 3: Identify why classical methods fail
Because they ignore word order and negation, they fail to capture true sentiment.
Final Answer:
They treat words independently and miss negation and word order. -> Option B
Quick Check:
Miss negation and order = D [OK]

Hint: Negation needs context; classical methods miss it [OK]

Common Mistakes:

Thinking classical methods need GPUs
Believing classical methods can't tokenize contractions
Confusing overfitting with context loss

Epoch	Loss ↓	Accuracy ↑	Observation
1	0.65	0.60	Model starts learning word patterns
2	0.55	0.68	Accuracy improves as model fits data better
3	0.50	0.75	Model converges but limited by simple features

Limitations of classical methods in NLP - Model Pipeline Trace

Start learning this pattern below

Practice

Solution

Step 1: Understand classical NLP methods

Step 2: Identify the limitation

Final Answer:

Quick Check:

Solution

Step 1: Identify classical method for feature extraction

Step 2: Match syntax to bag-of-words

Final Answer:

Quick Check:

Solution

Step 1: Count unique words in texts

Step 2: Check CountVectorizer default behavior

Final Answer:

Quick Check:

Solution

Step 1: Check CountVectorizer usage

Step 2: Identify correct method to get matrix

Final Answer:

Quick Check:

Solution

Step 1: Understand classical method limitations

Step 2: Analyze sentence complexity

Step 3: Identify why classical methods fail

Final Answer:

Quick Check: