Experiment - Text feature basics (CountVectorizer, TF-IDF)
Problem:You want to classify movie reviews as positive or negative using text data. Currently, the model uses CountVectorizer features but overfits, showing very high training accuracy but much lower validation accuracy.
Current Metrics:Training accuracy: 98%, Validation accuracy: 70%
Issue:The model overfits because CountVectorizer creates sparse features that may cause the model to memorize training data but not generalize well.