Topic coherence is a key metric in topic modeling. What does it measure?
Think about how well the words in a topic relate to each other in meaning.
Topic coherence measures how semantically related the top words in a topic are, indicating the quality of the topic.
What is the output of the following Python code that calculates topic coherence using Gensim?
from gensim.models import CoherenceModel from gensim.corpora.dictionary import Dictionary texts = [['apple', 'banana', 'fruit'], ['banana', 'orange', 'fruit'], ['apple', 'orange', 'fruit']] dictionary = Dictionary(texts) corpus = [dictionary.doc2bow(text) for text in texts] topics = [['apple', 'banana', 'fruit'], ['banana', 'orange', 'fruit']] coherence_model = CoherenceModel(topics=topics, texts=texts, dictionary=dictionary, coherence='c_v') score = coherence_model.get_coherence() print(round(score, 2))
Coherence scores range between 0 and 1, higher means better semantic similarity.
The code calculates the c_v coherence score for the given topics and texts, resulting in approximately 0.91.
You want to evaluate topic coherence on very short texts like tweets. Which coherence measure is most suitable?
Consider which measure works best with sparse or short documents.
c_npmi coherence is better for short texts because it normalizes PMI and handles sparse data well.
When increasing the number of topics in a topic model, what is the typical effect on the coherence score?
Think about how splitting topics too much affects their quality.
Increasing topics too much often lowers coherence because topics overlap and contain less meaningful word groups.
Consider this code snippet that raises a ValueError when calculating coherence. What is the cause?
from gensim.models import CoherenceModel texts = [['data', 'science'], ['machine', 'learning']] topics = [['data', 'science'], ['machine', 'learning']] coherence_model = CoherenceModel(topics=topics, texts=texts, coherence='c_v') score = coherence_model.get_coherence()
Check if all required inputs for CoherenceModel are provided.
The CoherenceModel requires a dictionary to map words to ids for 'c_v' coherence. Missing it causes ValueError.