0
0
NLPml~10 mins

Why text classification categorizes documents in NLP - Test Your Understanding

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to import the library used for text classification.

NLP
from sklearn.[1] import MultinomialNB
Drag options to blanks, or click blank then click option'
Afeature_extraction
Bmetrics
Cnaive_bayes
Dmodel_selection
Attempts:
3 left
💡 Hint
Common Mistakes
Importing from the wrong module like feature_extraction or metrics.
Trying to import MultinomialNB directly from sklearn.
2fill in blank
medium

Complete the code to convert text documents into numbers for the model.

NLP
from sklearn.feature_extraction.text import {{BLANK_1 }}
vectorizer = [1]()
Drag options to blanks, or click blank then click option'
ALabelEncoder
BCountVectorizer
CTfidfTransformer
DStandardScaler
Attempts:
3 left
💡 Hint
Common Mistakes
Using TfidfTransformer directly without first vectorizing text.
Using LabelEncoder which is for labels, not text features.
3fill in blank
hard

Fix the error in the code to train the text classification model.

NLP
model = MultinomialNB()
model.[1](X_train, y_train)
Drag options to blanks, or click blank then click option'
Afit
Bpredict
Ctransform
Dscore
Attempts:
3 left
💡 Hint
Common Mistakes
Using predict instead of fit to train the model.
Using transform which is for data preprocessing.
4fill in blank
hard

Fill both blanks to create a dictionary of word counts for documents with words longer than 3 letters.

NLP
word_counts = {word: [1] for word in document.split() if len(word) [2] 3}
Drag options to blanks, or click blank then click option'
Adocument.count(word)
Blen(word)
C>
D<=
Attempts:
3 left
💡 Hint
Common Mistakes
Using len(word) in the first blank which gives word length, not count.
Using '<=' instead of '>' in the condition.
5fill in blank
hard

Fill all three blanks to filter documents with label 'spam' and count words longer than 4 letters.

NLP
filtered = {doc: [1] for doc, label in dataset.items() if label == '[2]' and len(doc.split()[0]) [3] 4}
Drag options to blanks, or click blank then click option'
Alen(doc.split()[0])
Bspam
C>
Ddoc.upper()
Attempts:
3 left
💡 Hint
Common Mistakes
Using len(doc.split()[0]) in the second blank instead of the label string.
Using '<' instead of '>' in the length comparison.
Not transforming the document text in the first blank.