0
0
NLPml~20 mins

LDA with scikit-learn in NLP - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
LDA Mastery Badge
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of LDA topic distribution prediction
Given the following code snippet using scikit-learn's LatentDirichletAllocation, what is the shape of the output variable topic_distribution after calling lda.transform(doc_term_matrix)?
NLP
from sklearn.decomposition import LatentDirichletAllocation
from sklearn.feature_extraction.text import CountVectorizer

docs = ["apple banana apple", "banana orange banana", "apple orange orange"]
vectorizer = CountVectorizer()
doc_term_matrix = vectorizer.fit_transform(docs)
lda = LatentDirichletAllocation(n_components=2, random_state=0)
lda.fit(doc_term_matrix)
topic_distribution = lda.transform(doc_term_matrix)
print(topic_distribution.shape)
A(3, 2)
B(2, 3)
C(3, 3)
D(2, 2)
Attempts:
2 left
💡 Hint
The output shape corresponds to number of documents and number of topics.
Model Choice
intermediate
1:30remaining
Choosing the correct model for topic modeling
Which scikit-learn model is specifically designed for discovering topics in a collection of documents by modeling word distributions per topic?
APCA
BKMeans
CRandomForestClassifier
DLatentDirichletAllocation
Attempts:
2 left
💡 Hint
This model uses a probabilistic approach to find topics.
Hyperparameter
advanced
1:30remaining
Effect of n_components in LDA
In scikit-learn's LatentDirichletAllocation, what does the hyperparameter n_components control?
AThe number of topics to find in the documents
BThe maximum number of iterations during training
CThe learning rate for the optimizer
DThe minimum document frequency for words
Attempts:
2 left
💡 Hint
Think about how many groups of words the model tries to discover.
Metrics
advanced
1:30remaining
Evaluating LDA model quality
Which metric is commonly used to evaluate the quality of topics generated by an LDA model in scikit-learn?
AF1 Score
BAccuracy
CPerplexity
DMean Squared Error
Attempts:
2 left
💡 Hint
This metric measures how well the model predicts a sample.
🔧 Debug
expert
2:00remaining
Identifying error in LDA input data
What error will occur if you pass a dense numpy array instead of a sparse matrix to scikit-learn's LatentDirichletAllocation fit method?
NLP
import numpy as np
from sklearn.decomposition import LatentDirichletAllocation

X = np.array([[1, 2, 3], [4, 5, 6]])
lda = LatentDirichletAllocation(n_components=2)
lda.fit(X)
AValueError: Input must be a sparse matrix or a non-negative array
BNo error, code runs successfully
CAttributeError: 'numpy.ndarray' object has no attribute 'tocsc'
DTypeError: Expected sparse matrix but got dense array
Attempts:
2 left
💡 Hint
LDA accepts both dense arrays and sparse matrices as input.