Challenge - 5 Problems

🎖️

Sentiment Analysis Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Model Choice

intermediate

2:00remaining

Choosing the best model for sentiment analysis

You want to build a sentiment analysis model using scikit-learn on a dataset of movie reviews labeled positive or negative. Which model is most suitable for this binary text classification task?

ALinearSVC (Support Vector Classifier with linear kernel)

BKMeans clustering

CLinearRegression

DDBSCAN clustering

Attempts:

2 left

❓ Predict Output

intermediate

1:30remaining

Output of text vectorization step

What is the shape of the feature matrix X after applying CountVectorizer to 1000 text reviews, if the vocabulary size is 5000?

ML Python

from sklearn.feature_extraction.text import CountVectorizer
texts = ['sample text data'] * 1000
vectorizer = CountVectorizer(max_features=5000)
X = vectorizer.fit_transform(texts)
print(X.shape)

A(5000, 1000)

B(1000, 5000)

C(1000, 1)

D(1, 5000)

Attempts:

2 left

❓ Hyperparameter

advanced

1:30remaining

Choosing the right hyperparameter for LogisticRegression

You train a LogisticRegression model for sentiment analysis. Which hyperparameter controls the strength of regularization to prevent overfitting?

Apenalty (Type of regularization)

Bmax_iter (Maximum iterations for solver)

CC (Inverse of regularization strength)

Dsolver (Algorithm to use in optimization)

Attempts:

2 left

❓ Metrics

advanced

2:00remaining

Interpreting classification report metrics

After training a sentiment classifier, you get these metrics for the positive class: precision=0.8, recall=0.5, f1-score=0.62. What does the low recall indicate?

AThe model is overfitting the training data

BThe model predicts many false positives

CThe model has perfect accuracy

DThe model misses many positive reviews (false negatives are high)

Attempts:

2 left

🔧 Debug

expert

2:30remaining

Debugging unexpected accuracy drop after vectorizer change

You trained a sentiment model with CountVectorizer and got 85% accuracy. After switching to TfidfVectorizer with default settings, accuracy dropped to 70%. What is the most likely cause?

ATfidfVectorizer normalizes word counts, which may reduce the impact of frequent sentiment words

BTfidfVectorizer always produces fewer features than CountVectorizer, causing underfitting

CCountVectorizer applies stemming by default, but TfidfVectorizer does not

DTfidfVectorizer requires labels to be numeric, unlike CountVectorizer

Attempts:

2 left