Bird
0
0

Which scikit-learn class is used to convert text documents into a matrix of token counts before applying LDA?

easy📝 Conceptual Q2 of 15
NLP - Topic Modeling
Which scikit-learn class is used to convert text documents into a matrix of token counts before applying LDA?
AStandardScaler
BCountVectorizer
CLabelEncoder
DTfidfTransformer
Step-by-Step Solution
Solution:
  1. Step 1: Identify the preprocessing step for LDA

    LDA requires a matrix of token counts, which is created by CountVectorizer.
  2. Step 2: Differentiate from other transformers

    TfidfTransformer creates TF-IDF scores, not counts; LabelEncoder encodes labels; StandardScaler scales numeric data.
  3. Final Answer:

    CountVectorizer -> Option B
  4. Quick Check:

    Token counts = CountVectorizer [OK]
Quick Trick: CountVectorizer makes count matrix for LDA [OK]
Common Mistakes:
MISTAKES
  • Using TfidfTransformer instead of CountVectorizer
  • Confusing label encoding with text vectorization
  • Applying scaling to text data

Want More Practice?

15+ quiz questions · All difficulty levels · Free

Free Signup - Practice All Questions
More NLP Quizzes