Which scikit-learn class is used to convert text documents into a matrix of token counts before applying LDA?

easy📝 Conceptual Q2 of 15

NLP - Topic Modeling

AStandardScaler

BCountVectorizer

CLabelEncoder

DTfidfTransformer

Step-by-Step Solution

Solution:

Step 1: Identify the preprocessing step for LDA
LDA requires a matrix of token counts, which is created by CountVectorizer.
Step 2: Differentiate from other transformers
TfidfTransformer creates TF-IDF scores, not counts; LabelEncoder encodes labels; StandardScaler scales numeric data.
Final Answer:
CountVectorizer -> Option B
Quick Check:
Token counts = CountVectorizer [OK]

Quick Trick: CountVectorizer makes count matrix for LDA [OK]

Common Mistakes:

MISTAKES

Master "Topic Modeling" in NLP

9 interactive learning modes - each teaches the same concept differently

Want More Practice?

15+ quiz questions · All difficulty levels · Free

More NLP Quizzes