Challenge - 5 Problems
LDA Mastery Badge
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of LDA topic distribution prediction
Given the following code snippet using scikit-learn's LatentDirichletAllocation, what is the shape of the output variable
topic_distribution after calling lda.transform(doc_term_matrix)?NLP
from sklearn.decomposition import LatentDirichletAllocation from sklearn.feature_extraction.text import CountVectorizer docs = ["apple banana apple", "banana orange banana", "apple orange orange"] vectorizer = CountVectorizer() doc_term_matrix = vectorizer.fit_transform(docs) lda = LatentDirichletAllocation(n_components=2, random_state=0) lda.fit(doc_term_matrix) topic_distribution = lda.transform(doc_term_matrix) print(topic_distribution.shape)
Attempts:
2 left
💡 Hint
The output shape corresponds to number of documents and number of topics.
✗ Incorrect
The transform method returns the topic distribution for each document. Since there are 3 documents and 2 topics, the shape is (3, 2).
❓ Model Choice
intermediate1:30remaining
Choosing the correct model for topic modeling
Which scikit-learn model is specifically designed for discovering topics in a collection of documents by modeling word distributions per topic?
Attempts:
2 left
💡 Hint
This model uses a probabilistic approach to find topics.
✗ Incorrect
LatentDirichletAllocation (LDA) is a probabilistic model that finds topics by modeling word distributions per topic in documents.
❓ Hyperparameter
advanced1:30remaining
Effect of n_components in LDA
In scikit-learn's LatentDirichletAllocation, what does the hyperparameter
n_components control?Attempts:
2 left
💡 Hint
Think about how many groups of words the model tries to discover.
✗ Incorrect
The
n_components parameter sets how many topics the LDA model will find.❓ Metrics
advanced1:30remaining
Evaluating LDA model quality
Which metric is commonly used to evaluate the quality of topics generated by an LDA model in scikit-learn?
Attempts:
2 left
💡 Hint
This metric measures how well the model predicts a sample.
✗ Incorrect
Perplexity measures how well the LDA model predicts unseen data; lower perplexity indicates better generalization.
🔧 Debug
expert2:00remaining
Identifying error in LDA input data
What error will occur if you pass a dense numpy array instead of a sparse matrix to scikit-learn's LatentDirichletAllocation fit method?
NLP
import numpy as np from sklearn.decomposition import LatentDirichletAllocation X = np.array([[1, 2, 3], [4, 5, 6]]) lda = LatentDirichletAllocation(n_components=2) lda.fit(X)
Attempts:
2 left
💡 Hint
LDA accepts both dense arrays and sparse matrices as input.
✗ Incorrect
LatentDirichletAllocation accepts both dense numpy arrays (non-negative) and sparse matrices. No error occurs.