Model trained on these examples predicts label for 'good movie' as 1.
Final Answer:
[1] -> Option B
Quick Check:
Prediction for 'good movie' = 1 [OK]
Hint: Model predicts label matching training example [OK]
Common Mistakes:
Assuming prediction returns multiple labels
Thinking model is untrained causing error
Confusing label 0 and 1
4. Identify the error in this code snippet for logistic regression on text:
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import CountVectorizer
texts = ['happy', 'sad']
labels = [1, 0]
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)
model = LogisticRegression()
model.fit(texts, labels)
medium
A. model.fit should use numeric features, not raw texts
B. CountVectorizer is not imported
C. fit_transform should be called on labels
D. Labels should be strings, not integers
Solution
Step 1: Check input to model.fit
Model expects numeric features, but code passes raw text strings.
Step 2: Correct usage of vectorized data
Must pass X (vectorized text) to model.fit, not original texts.
Final Answer:
model.fit should use numeric features, not raw texts -> Option A
Quick Check:
Model needs numbers, not raw text [OK]
Hint: Pass vectorized text, not raw strings, to model.fit [OK]
Common Mistakes:
Passing raw text instead of vectorized data
Confusing labels data type requirements
Ignoring import statements
5. You trained a logistic regression model on text data using CountVectorizer. When testing on new sentences, the model predicts only one class for all inputs. What is the best way to improve the model's performance?
hard
A. Change logistic regression to linear regression
B. Remove CountVectorizer and use raw text directly
C. Use fewer training examples to avoid overfitting
D. Increase the number of training examples and use n-grams in CountVectorizer
Solution
Step 1: Understand cause of single-class prediction
Model may be underfitting due to limited data or simple features.
Step 2: Improve feature richness and data size
Adding more training examples and using n-grams captures more context, improving model learning.
Final Answer:
Increase the number of training examples and use n-grams in CountVectorizer -> Option D
Quick Check:
More data + better features = better model [OK]
Hint: More data and richer features improve classification [OK]