0
0
NLPml~20 mins

Latent Dirichlet Allocation (LDA) in NLP - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
LDA Mastery Badge
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:30remaining
What does the 'topic' represent in LDA?

In Latent Dirichlet Allocation, what does a 'topic' most accurately represent?

AA single word that best describes a document
BA fixed label assigned to each document
CA cluster of documents grouped by similarity
DA distribution over words showing which words are likely to appear together
Attempts:
2 left
💡 Hint

Think about how LDA models topics as probabilities over vocabulary.

Predict Output
intermediate
2:00remaining
Output of LDA topic distribution for a document

Given the following Python code using sklearn's LDA, what is the shape of doc_topic_dist?

NLP
from sklearn.decomposition import LatentDirichletAllocation
from sklearn.feature_extraction.text import CountVectorizer

docs = ["apple banana apple", "banana orange banana", "apple orange orange"]
vectorizer = CountVectorizer()
dtm = vectorizer.fit_transform(docs)
lda = LatentDirichletAllocation(n_components=2, random_state=0)
lda.fit(dtm)
doc_topic_dist = lda.transform(dtm)
print(doc_topic_dist.shape)
A(3, 2)
B(2, 3)
C(3, 3)
D(2, 2)
Attempts:
2 left
💡 Hint

Check how many documents and topics are in the model.

Model Choice
advanced
2:00remaining
Choosing the number of topics in LDA

You want to model topics in a large collection of news articles using LDA. Which approach is best to decide the number of topics?

AUse domain knowledge and try multiple values, then select based on coherence or perplexity scores
BAlways set the number of topics to 10 by default
CSet the number of topics equal to the number of documents
DUse the number of unique words as the number of topics
Attempts:
2 left
💡 Hint

Think about how to balance model complexity and interpretability.

Metrics
advanced
1:30remaining
Interpreting LDA perplexity score

After training an LDA model, you get a perplexity score of 1200 on your test set. What does a lower perplexity score indicate?

AThe model is overfitting the training data
BThe model predicts the test data better, indicating better generalization
CThe model has fewer topics
DThe model is ignoring rare words
Attempts:
2 left
💡 Hint

Perplexity measures how well the model predicts unseen data.

🔧 Debug
expert
2:30remaining
Identifying error in LDA code snippet

What error will this code raise?

NLP
from sklearn.decomposition import LatentDirichletAllocation
from sklearn.feature_extraction.text import CountVectorizer

docs = ["cat dog", "dog mouse", "cat mouse"]
vectorizer = CountVectorizer()
dtm = vectorizer.fit_transform(docs)
lda = LatentDirichletAllocation(n_components=3, random_state=0)
lda.fit_transform(dtm)
print(lda.components_.shape)
ATypeError: fit_transform() missing 1 required positional argument
BAttributeError: 'LatentDirichletAllocation' object has no attribute 'components_'
C(3, 3)
DValueError: n_components must be less than or equal to number of features
Attempts:
2 left
💡 Hint

Check the shape of the document-term matrix and the number of topics.