Experiment - Why text classification categorizes documents
Problem:We want to teach a computer to read short text messages and decide what category they belong to, like 'sports', 'technology', or 'health'. Currently, the model guesses correctly 95% of the time on the training messages but only 70% on new messages it hasn't seen before.
Current Metrics:Training accuracy: 95%, Validation accuracy: 70%
Issue:The model is overfitting. It learns the training messages too well but does not generalize to new messages.