Experiment - LDA with scikit-learn
Problem:We want to find topics in a collection of text documents using Latent Dirichlet Allocation (LDA). The current model fits well on training data but performs poorly on unseen documents.
Current Metrics:Training perplexity: 1200, Validation perplexity: 1800
Issue:The model is overfitting: training perplexity is much lower than validation perplexity, indicating poor generalization.