0
0
NLPml~5 mins

Handling imbalanced text data in NLP - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What does 'imbalanced text data' mean in machine learning?
Imbalanced text data means some classes or categories have many more examples than others, making it hard for models to learn equally well from all classes.
Click to reveal answer
beginner
Name one simple method to handle imbalanced text data.
One simple method is 'oversampling' the minority class by duplicating its examples to balance the dataset.
Click to reveal answer
intermediate
What is 'undersampling' and when is it used?
Undersampling means reducing the number of examples in the majority class to balance the dataset. It is used when the majority class is very large and can be safely reduced without losing important information.
Click to reveal answer
intermediate
How can synthetic data generation help with imbalanced text data?
Synthetic data generation creates new, artificial examples of the minority class (like using SMOTE) to increase its size and help the model learn better.
Click to reveal answer
beginner
Why is accuracy not a good metric for imbalanced text classification?
Accuracy can be misleading because a model can predict the majority class all the time and still get high accuracy, ignoring the minority class performance.
Click to reveal answer
What is a common problem when training on imbalanced text data?
AThe model ignores minority classes
BThe model trains faster
CThe model always predicts minority classes
DThe model requires no preprocessing
Which technique involves creating new examples for the minority class?
AUndersampling
BOversampling
CFeature scaling
DSynthetic data generation
Why might undersampling be risky?
AIt can remove useful data from the majority class
BIt increases dataset size
CIt duplicates minority class data
DIt always improves accuracy
Which metric is better than accuracy for imbalanced text classification?
AF1-score
BRecall
CAll of the above
DPrecision
What does oversampling do?
ARemoves majority class examples
BDuplicates minority class examples
CCreates synthetic majority class data
DNormalizes text data
Explain why handling imbalanced text data is important and describe two methods to address it.
Think about how models learn better with balanced examples.
You got /3 concepts.
    Describe how synthetic data generation can help with imbalanced text data and name a technique used for it.
    It’s like making new examples similar to existing minority data.
    You got /3 concepts.