Challenge - 5 Problems

🎖️

LDA Mastery Badge

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of LDA topic distribution prediction

Given the following code snippet using scikit-learn's LatentDirichletAllocation, what is the shape of the output variable topic_distribution after calling lda.transform(doc_term_matrix)?

NLP

from sklearn.decomposition import LatentDirichletAllocation
from sklearn.feature_extraction.text import CountVectorizer

docs = ["apple banana apple", "banana orange banana", "apple orange orange"]
vectorizer = CountVectorizer()
doc_term_matrix = vectorizer.fit_transform(docs)
lda = LatentDirichletAllocation(n_components=2, random_state=0)
lda.fit(doc_term_matrix)
topic_distribution = lda.transform(doc_term_matrix)
print(topic_distribution.shape)

A(3, 2)

B(2, 3)

C(3, 3)

D(2, 2)

Attempts:

2 left

❓ Model Choice

intermediate

1:30remaining

Choosing the correct model for topic modeling

Which scikit-learn model is specifically designed for discovering topics in a collection of documents by modeling word distributions per topic?

APCA

BKMeans

CRandomForestClassifier

DLatentDirichletAllocation

Attempts:

2 left

❓ Hyperparameter

advanced

1:30remaining

Effect of n_components in LDA

In scikit-learn's LatentDirichletAllocation, what does the hyperparameter n_components control?

AThe number of topics to find in the documents

BThe maximum number of iterations during training

CThe learning rate for the optimizer

DThe minimum document frequency for words

Attempts:

2 left

❓ Metrics

advanced

1:30remaining

Evaluating LDA model quality

Which metric is commonly used to evaluate the quality of topics generated by an LDA model in scikit-learn?

AF1 Score

BAccuracy

CPerplexity

DMean Squared Error

Attempts:

2 left

🔧 Debug

expert

2:00remaining

Identifying error in LDA input data

What error will occur if you pass a dense numpy array instead of a sparse matrix to scikit-learn's LatentDirichletAllocation fit method?

NLP

import numpy as np
from sklearn.decomposition import LatentDirichletAllocation

X = np.array([[1, 2, 3], [4, 5, 6]])
lda = LatentDirichletAllocation(n_components=2)
lda.fit(X)

AValueError: Input must be a sparse matrix or a non-negative array

BNo error, code runs successfully

CAttributeError: 'numpy.ndarray' object has no attribute 'tocsc'

DTypeError: Expected sparse matrix but got dense array

Attempts:

2 left

Practice

(1/5)

1. What is the main purpose of using LDA (Latent Dirichlet Allocation) in text analysis?

easy

A. To remove stop words from text data

B. To translate text from one language to another

C. To count the number of words in a document

D. To find hidden topics by grouping words that often appear together

LDA with scikit-learn in NLP - Practice Problems & Coding Challenges

Start learning this pattern below

Practice

Solution

Step 1: Understand LDA's goal

Step 2: Compare options with LDA's purpose

Final Answer:

Quick Check:

Solution

Step 1: Recall correct import path

Step 2: Check each option

Final Answer:

Quick Check:

Solution

Step 1: Understand input and model parameters

Step 2: Determine output shape of lda.transform

Final Answer:

Quick Check:

Solution

Step 1: Check usage of fit_transform

Step 2: Verify attribute and parameters

Final Answer:

Quick Check:

Solution

Step 1: Understand lda.components_ role

Step 2: Map top weights to words

Final Answer:

Quick Check: