Recall & Review
beginner
What does 'imbalanced text data' mean in machine learning?
Imbalanced text data means some classes or categories have many more examples than others, making it hard for models to learn equally well from all classes.
Click to reveal answer
beginner
Name one simple method to handle imbalanced text data.
One simple method is 'oversampling' the minority class by duplicating its examples to balance the dataset.Click to reveal answer
intermediate
What is 'undersampling' and when is it used?
Undersampling means reducing the number of examples in the majority class to balance the dataset. It is used when the majority class is very large and can be safely reduced without losing important information.Click to reveal answer
intermediate
How can synthetic data generation help with imbalanced text data?
Synthetic data generation creates new, artificial examples of the minority class (like using SMOTE) to increase its size and help the model learn better.
Click to reveal answer
beginner
Why is accuracy not a good metric for imbalanced text classification?
Accuracy can be misleading because a model can predict the majority class all the time and still get high accuracy, ignoring the minority class performance.Click to reveal answer
What is a common problem when training on imbalanced text data?
✗ Incorrect
Models tend to ignore minority classes because they see fewer examples, leading to poor predictions for those classes.
Which technique involves creating new examples for the minority class?
✗ Incorrect
Synthetic data generation creates new artificial examples to increase minority class size.
Why might undersampling be risky?
✗ Incorrect
Removing too many majority class examples can lose important information and hurt model performance.
Which metric is better than accuracy for imbalanced text classification?
✗ Incorrect
Precision, recall, and F1-score give better insight into minority class performance.
What does oversampling do?
✗ Incorrect
Oversampling duplicates or adds more examples to the minority class to balance the dataset.
Explain why handling imbalanced text data is important and describe two methods to address it.
Think about how models learn better with balanced examples.
You got /3 concepts.
Describe how synthetic data generation can help with imbalanced text data and name a technique used for it.
It’s like making new examples similar to existing minority data.
You got /3 concepts.