Experiment - Latent Dirichlet Allocation (LDA)
Problem:We want to discover hidden topics in a collection of text documents using Latent Dirichlet Allocation (LDA). The current model uses 5 topics but the topics are not very coherent and the model seems to overfit the training data.
Current Metrics:Training perplexity: 120.5, Validation perplexity: 180.3, Topic coherence (C_v): 0.32
Issue:The model overfits the training data, shown by much lower training perplexity than validation perplexity, and the topic coherence is low indicating poor topic quality.