Challenge - 5 Problems
Logistic Regression Text Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
What is the output of this logistic regression prediction code?
Given a trained logistic regression model and a text vectorizer, what will be the predicted class for the input text?
NLP
from sklearn.feature_extraction.text import CountVectorizer from sklearn.linear_model import LogisticRegression texts = ['good movie', 'bad movie', 'great film', 'terrible film'] labels = [1, 0, 1, 0] vectorizer = CountVectorizer() X = vectorizer.fit_transform(texts) model = LogisticRegression() model.fit(X, labels) new_text = ['good film'] X_new = vectorizer.transform(new_text) prediction = model.predict(X_new) print(prediction[0])
Attempts:
2 left
💡 Hint
Think about how the model was trained and what the new text contains.
✗ Incorrect
The model was trained on texts labeled 1 for positive and 0 for negative. 'good film' contains words associated with positive labels, so prediction is 1.
❓ Model Choice
intermediate2:00remaining
Which model is best suited for binary text classification with logistic regression?
You want to classify movie reviews as positive or negative using logistic regression. Which preprocessing and model pipeline is most appropriate?
Attempts:
2 left
💡 Hint
Consider which vectorization method captures word importance better for text.
✗ Incorrect
TfidfVectorizer weights words by importance and LogisticRegression with L2 regularization helps prevent overfitting, making option A best.
❓ Hyperparameter
advanced2:00remaining
Which hyperparameter change will most reduce overfitting in logistic regression for text?
You trained a logistic regression model on text data and see high training accuracy but low test accuracy. Which hyperparameter adjustment is best to reduce overfitting?
Attempts:
2 left
💡 Hint
Think about how regularization controls model complexity.
✗ Incorrect
Decreasing C increases regularization strength, which penalizes large coefficients and reduces overfitting.
❓ Metrics
advanced2:00remaining
Which metric is best to evaluate logistic regression on imbalanced text data?
You have a logistic regression model classifying rare positive events in text. Accuracy is high but misleading. Which metric should you use to better evaluate performance?
Attempts:
2 left
💡 Hint
Consider a metric that balances false positives and false negatives.
✗ Incorrect
F1-score balances precision and recall, providing a better measure for imbalanced classes than accuracy alone.
🔧 Debug
expert2:00remaining
Why does this logistic regression training code raise a ValueError?
Examine the code below and identify the cause of the ValueError during model training.
NLP
from sklearn.feature_extraction.text import CountVectorizer from sklearn.linear_model import LogisticRegression texts = ['happy', 'sad', 'joyful'] labels = [1, 0] vectorizer = CountVectorizer() X = vectorizer.fit_transform(texts) model = LogisticRegression() model.fit(X, labels)
Attempts:
2 left
💡 Hint
Check if the input data and labels have matching lengths.
✗ Incorrect
The labels list has 2 elements but texts list has 3, causing a mismatch error during fit.