0
0
NLPml~20 mins

Topic coherence evaluation in NLP - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Topic Coherence Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:30remaining
What does topic coherence measure in topic modeling?

Topic coherence is a key metric in topic modeling. What does it measure?

AThe number of topics generated by the model
BThe speed at which the model converges during training
CThe size of the vocabulary used in the model
DThe semantic similarity between the top words in a topic
Attempts:
2 left
💡 Hint

Think about how well the words in a topic relate to each other in meaning.

Predict Output
intermediate
2:00remaining
Output of coherence score calculation code

What is the output of the following Python code that calculates topic coherence using Gensim?

NLP
from gensim.models import CoherenceModel
from gensim.corpora.dictionary import Dictionary
texts = [['apple', 'banana', 'fruit'], ['banana', 'orange', 'fruit'], ['apple', 'orange', 'fruit']]
dictionary = Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]
topics = [['apple', 'banana', 'fruit'], ['banana', 'orange', 'fruit']]
coherence_model = CoherenceModel(topics=topics, texts=texts, dictionary=dictionary, coherence='c_v')
score = coherence_model.get_coherence()
print(round(score, 2))
A0.91
B0.45
C1.00
D0.00
Attempts:
2 left
💡 Hint

Coherence scores range between 0 and 1, higher means better semantic similarity.

Model Choice
advanced
2:00remaining
Choosing a coherence measure for short texts

You want to evaluate topic coherence on very short texts like tweets. Which coherence measure is most suitable?

Au_mass coherence, which relies on document co-occurrence counts
Bc_npmi coherence, which uses normalized pointwise mutual information
Cc_v coherence, which uses sliding windows and a boolean sliding window approach
Dc_uci coherence, which uses pointwise mutual information with a sliding window
Attempts:
2 left
💡 Hint

Consider which measure works best with sparse or short documents.

Hyperparameter
advanced
1:30remaining
Effect of number of topics on coherence score

When increasing the number of topics in a topic model, what is the typical effect on the coherence score?

ACoherence score remains constant regardless of topic number
BCoherence score fluctuates randomly with no pattern
CCoherence score usually decreases because topics become less distinct and more noisy
DCoherence score usually increases because more topics capture more details
Attempts:
2 left
💡 Hint

Think about how splitting topics too much affects their quality.

🔧 Debug
expert
2:30remaining
Why does this coherence calculation raise a ValueError?

Consider this code snippet that raises a ValueError when calculating coherence. What is the cause?

NLP
from gensim.models import CoherenceModel
texts = [['data', 'science'], ['machine', 'learning']]
topics = [['data', 'science'], ['machine', 'learning']]
coherence_model = CoherenceModel(topics=topics, texts=texts, coherence='c_v')
score = coherence_model.get_coherence()
AThe dictionary parameter is missing, so the model cannot map words to ids
BThe topics list is empty, causing no data to compute coherence
CThe coherence type 'c_v' is not supported by Gensim
DThe texts contain words not present in topics, causing mismatch
Attempts:
2 left
💡 Hint

Check if all required inputs for CoherenceModel are provided.