0
0
NLPml~20 mins

Logistic regression for text in NLP - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Logistic Regression Text Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
What is the output of this logistic regression prediction code?
Given a trained logistic regression model and a text vectorizer, what will be the predicted class for the input text?
NLP
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression

texts = ['good movie', 'bad movie', 'great film', 'terrible film']
labels = [1, 0, 1, 0]

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

model = LogisticRegression()
model.fit(X, labels)

new_text = ['good film']
X_new = vectorizer.transform(new_text)
prediction = model.predict(X_new)
print(prediction[0])
A0
BRaises a NotFittedError because model is not trained
CRaises a ValueError due to shape mismatch
D1
Attempts:
2 left
💡 Hint
Think about how the model was trained and what the new text contains.
Model Choice
intermediate
2:00remaining
Which model is best suited for binary text classification with logistic regression?
You want to classify movie reviews as positive or negative using logistic regression. Which preprocessing and model pipeline is most appropriate?
AUse TfidfVectorizer to convert text to TF-IDF features, then LogisticRegression with L2 regularization
BUse raw text directly as input to LogisticRegression without vectorization
CUse CountVectorizer to convert text to counts, then LogisticRegression with default parameters
DUse PCA on raw text data, then LogisticRegression
Attempts:
2 left
💡 Hint
Consider which vectorization method captures word importance better for text.
Hyperparameter
advanced
2:00remaining
Which hyperparameter change will most reduce overfitting in logistic regression for text?
You trained a logistic regression model on text data and see high training accuracy but low test accuracy. Which hyperparameter adjustment is best to reduce overfitting?
ASet solver to 'liblinear'
BIncrease the maximum number of iterations
CIncrease the regularization strength by decreasing C value
DUse a smaller vocabulary size in the vectorizer
Attempts:
2 left
💡 Hint
Think about how regularization controls model complexity.
Metrics
advanced
2:00remaining
Which metric is best to evaluate logistic regression on imbalanced text data?
You have a logistic regression model classifying rare positive events in text. Accuracy is high but misleading. Which metric should you use to better evaluate performance?
AF1-score
BPrecision
CRecall
DAccuracy
Attempts:
2 left
💡 Hint
Consider a metric that balances false positives and false negatives.
🔧 Debug
expert
2:00remaining
Why does this logistic regression training code raise a ValueError?
Examine the code below and identify the cause of the ValueError during model training.
NLP
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression

texts = ['happy', 'sad', 'joyful']
labels = [1, 0]

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

model = LogisticRegression()
model.fit(X, labels)
ALogisticRegression requires dense input, sparse matrix causes error
BMismatch between number of texts and labels causes ValueError
CCountVectorizer cannot process single-word texts
DLabels must be strings, integers cause error
Attempts:
2 left
💡 Hint
Check if the input data and labels have matching lengths.