0
0
NLPml~8 mins

Topic coherence evaluation in NLP - Model Metrics & Evaluation

Choose your learning style9 modes available
Metrics & Evaluation - Topic coherence evaluation
Which metric matters for Topic Coherence Evaluation and WHY

Topic coherence measures how well the words in a topic group together in a meaningful way. It helps us know if the topic model found clear and understandable themes. Good coherence means the topic words make sense together, like a group of friends who share interests. This is important because a model with high coherence gives topics that humans can easily interpret and trust.

Confusion Matrix or Equivalent Visualization

Topic coherence does not use a confusion matrix like classification. Instead, it uses scores computed from word co-occurrences in documents. For example, the UMass coherence score is calculated by comparing how often pairs of words appear together in the same documents.

    Coherence(topic) = \sum_{m=2}^M \sum_{l=1}^{m-1} \log \frac{D(w_m, w_l) + 1}{D(w_l)}

    where:
    - D(w_m, w_l) = number of documents containing both words w_m and w_l
    - D(w_l) = number of documents containing word w_l
    

Higher coherence scores mean better topics. Scores can be positive or negative depending on the method.

Tradeoff: Coherence vs Number of Topics

Choosing more topics can lower coherence because topics become too specific or overlap. Choosing fewer topics can increase coherence but lose detail. For example:

  • With 5 topics, coherence might be high but topics are broad.
  • With 50 topics, coherence might drop because topics are noisy.

We balance coherence with the number of topics to get meaningful and distinct themes.

What Good vs Bad Coherence Looks Like

Good coherence: Topic words are related and form a clear theme, e.g., "dog, cat, pet, animal, leash".

Bad coherence: Topic words are unrelated or random, e.g., "dog, computer, sky, money, apple".

Good coherence scores are higher (closer to zero or positive depending on metric). Bad coherence scores are lower (more negative or near zero).

Common Pitfalls in Topic Coherence Evaluation
  • Ignoring stopwords: Including common words like "the" can inflate coherence falsely.
  • Data leakage: Using test data to compute coherence can give overly optimistic scores.
  • Overfitting: Very high coherence with many topics may mean the model memorizes data, not generalizes.
  • Metric choice: Different coherence metrics (UMass, CV, NPMI) can give different results; choose one that fits your data and goals.
Self Check

Your topic model has a coherence score of -1.5 with 100 topics. Is this good?

Answer: No, a negative coherence score that low suggests topics are not meaningful. Also, 100 topics may be too many, causing noisy and overlapping topics. You should try fewer topics and check if coherence improves.

Key Result
Topic coherence measures how well topic words group meaningfully; higher coherence means clearer, more interpretable topics.