Challenge - 5 Problems
Imbalanced Text Data Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate2:00remaining
Why use SMOTE for imbalanced text data?
You have a text classification task with very few examples of one class. Why might you use SMOTE (Synthetic Minority Over-sampling Technique) on the text features?
Attempts:
2 left
💡 Hint
Think about how SMOTE creates new data points rather than just copying existing ones.
✗ Incorrect
SMOTE generates new synthetic samples by interpolating between existing minority class feature vectors, which helps the model learn better decision boundaries.
❓ Predict Output
intermediate2:00remaining
Output of class distribution after random oversampling
Given the following code that uses RandomOverSampler on text data features, what will be the printed class distribution?
NLP
from collections import Counter from sklearn.feature_extraction.text import CountVectorizer from imblearn.over_sampling import RandomOverSampler texts = ['good', 'bad', 'good', 'bad', 'bad', 'good', 'good', 'bad', 'bad', 'bad'] labels = [1, 0, 1, 0, 0, 1, 1, 0, 0, 0] vectorizer = CountVectorizer() X = vectorizer.fit_transform(texts) ros = RandomOverSampler(random_state=42) X_res, y_res = ros.fit_resample(X, labels) print(Counter(y_res))
Attempts:
2 left
💡 Hint
RandomOverSampler balances classes by duplicating minority class samples until counts match.
✗ Incorrect
The original counts are 0:6 and 1:4. RandomOverSampler duplicates minority class (1) samples to match majority class (0) count of 6.
❓ Model Choice
advanced2:00remaining
Best model choice for imbalanced text classification
You have a highly imbalanced text dataset with 95% negative and 5% positive labels. Which model choice is best to handle this imbalance?
Attempts:
2 left
💡 Hint
Consider models that can adjust learning to pay more attention to minority class.
✗ Incorrect
Logistic regression with class_weight='balanced' adjusts the loss to give more importance to minority class, helping with imbalance.
❓ Hyperparameter
advanced2:00remaining
Choosing the right threshold for imbalanced text classification
After training a binary text classifier on imbalanced data, you notice low recall for the minority class. Which hyperparameter adjustment can help improve recall?
Attempts:
2 left
💡 Hint
Recall improves when the model predicts more positive cases, even if some are false positives.
✗ Incorrect
Lowering the threshold makes the model label more samples as positive, increasing recall but possibly lowering precision.
❓ Metrics
expert2:00remaining
Choosing the best metric for imbalanced text data evaluation
You trained a text classifier on imbalanced data. Which metric is best to evaluate model performance focusing on minority class detection?
Attempts:
2 left
💡 Hint
Accuracy can be misleading when classes are imbalanced.
✗ Incorrect
F1-score combines precision and recall, giving a better sense of minority class detection quality than accuracy.