When picking how many topics to use in a topic model, we want metrics that tell us how clear and useful the topics are. Common metrics include:
- Coherence: Measures how related the top words in each topic are. Higher coherence means topics make more sense together.
- Perplexity: Measures how well the model predicts unseen data. Lower perplexity means better generalization.
Coherence is often preferred because it matches human understanding better. We want a number of topics that balances good coherence without making topics too broad or too narrow.