0
0
ML Pythonml~20 mins

Sentiment analysis with scikit-learn in ML Python - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Sentiment Analysis Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Model Choice
intermediate
2:00remaining
Choosing the best model for sentiment analysis
You want to build a sentiment analysis model using scikit-learn on a dataset of movie reviews labeled positive or negative. Which model is most suitable for this binary text classification task?
ALinearSVC (Support Vector Classifier with linear kernel)
BKMeans clustering
CLinearRegression
DDBSCAN clustering
Attempts:
2 left
💡 Hint
Think about models designed for classification, not clustering or regression.
Predict Output
intermediate
1:30remaining
Output of text vectorization step
What is the shape of the feature matrix X after applying CountVectorizer to 1000 text reviews, if the vocabulary size is 5000?
ML Python
from sklearn.feature_extraction.text import CountVectorizer
texts = ['sample text data'] * 1000
vectorizer = CountVectorizer(max_features=5000)
X = vectorizer.fit_transform(texts)
print(X.shape)
A(5000, 1000)
B(1000, 5000)
C(1000, 1)
D(1, 5000)
Attempts:
2 left
💡 Hint
Rows represent samples, columns represent features.
Hyperparameter
advanced
1:30remaining
Choosing the right hyperparameter for LogisticRegression
You train a LogisticRegression model for sentiment analysis. Which hyperparameter controls the strength of regularization to prevent overfitting?
Apenalty (Type of regularization)
Bmax_iter (Maximum iterations for solver)
CC (Inverse of regularization strength)
Dsolver (Algorithm to use in optimization)
Attempts:
2 left
💡 Hint
Look for the parameter that adjusts how much the model avoids complexity.
Metrics
advanced
2:00remaining
Interpreting classification report metrics
After training a sentiment classifier, you get these metrics for the positive class: precision=0.8, recall=0.5, f1-score=0.62. What does the low recall indicate?
AThe model is overfitting the training data
BThe model predicts many false positives
CThe model has perfect accuracy
DThe model misses many positive reviews (false negatives are high)
Attempts:
2 left
💡 Hint
Recall measures how many actual positives are found.
🔧 Debug
expert
2:30remaining
Debugging unexpected accuracy drop after vectorizer change
You trained a sentiment model with CountVectorizer and got 85% accuracy. After switching to TfidfVectorizer with default settings, accuracy dropped to 70%. What is the most likely cause?
ATfidfVectorizer normalizes word counts, which may reduce the impact of frequent sentiment words
BTfidfVectorizer always produces fewer features than CountVectorizer, causing underfitting
CCountVectorizer applies stemming by default, but TfidfVectorizer does not
DTfidfVectorizer requires labels to be numeric, unlike CountVectorizer
Attempts:
2 left
💡 Hint
Think about how TF-IDF changes word importance compared to raw counts.