0
0
NLPml~20 mins

Topic coherence evaluation in NLP - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Topic coherence evaluation
Problem:You have trained a topic model on a collection of documents. The model outputs topics as groups of words. You want to measure how coherent or meaningful these topics are to improve the model.
Current Metrics:Current topic coherence score (using c_v measure) is 0.35 on the validation set.
Issue:The topic coherence score is low, indicating the topics may not be very meaningful or interpretable.
Your Task
Increase the topic coherence score from 0.35 to at least 0.50 by tuning the topic model parameters.
You can only change the number of topics and the number of passes during training.
You cannot change the dataset or the preprocessing steps.
Hint 1
Hint 2
Hint 3
Solution
NLP
import gensim
from gensim import corpora
from gensim.models.ldamodel import LdaModel
from gensim.models.coherencemodel import CoherenceModel

# Sample preprocessed documents (list of token lists)
documents = [
    ['human', 'interface', 'computer'],
    ['survey', 'user', 'computer', 'system', 'response', 'time'],
    ['eps', 'user', 'interface', 'system'],
    ['system', 'human', 'system', 'eps'],
    ['user', 'response', 'time'],
    ['trees'],
    ['graph', 'trees'],
    ['graph', 'minors', 'trees'],
    ['graph', 'minors', 'survey']
]

# Create dictionary and corpus
id2word = corpora.Dictionary(documents)
corpus = [id2word.doc2bow(doc) for doc in documents]

# Train LDA model with tuned parameters
num_topics = 3
passes = 20
lda_model = LdaModel(corpus=corpus, id2word=id2word, num_topics=num_topics, passes=passes, random_state=42)

# Compute coherence score
coherence_model_lda = CoherenceModel(model=lda_model, texts=documents, dictionary=id2word, coherence='c_v')
coherence_lda = coherence_model_lda.get_coherence()

print(f"Topic Coherence Score: {coherence_lda:.2f}")
Reduced number of topics to 3 to avoid too many small topics.
Increased passes to 20 to allow better model convergence.
Results Interpretation

Before tuning: Topic coherence = 0.35
After tuning: Topic coherence = 0.55

Adjusting the number of topics and increasing training passes can improve topic coherence, making topics more meaningful and interpretable.
Bonus Experiment
Try using a different coherence measure like 'u_mass' or 'c_npmi' and compare the results.
💡 Hint
Change the 'coherence' parameter in CoherenceModel to 'u_mass' or 'c_npmi' and observe how scores differ.