0
0
NLPml~10 mins

Handling imbalanced text data in NLP - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to import the class used for oversampling minority classes.

NLP
from imblearn.over_sampling import [1]
Drag options to blanks, or click blank then click option'
ATfidfVectorizer
BRandomOverSampler
CSMOTE
DCountVectorizer
Attempts:
3 left
💡 Hint
Common Mistakes
Confusing vectorizers with oversampling methods.
Using RandomOverSampler instead of SMOTE.
2fill in blank
medium

Complete the code to convert text data into numerical features using TF-IDF.

NLP
from sklearn.feature_extraction.text import [1]
tfidf = [1](stop_words='english')
X = tfidf.fit_transform(texts)
Drag options to blanks, or click blank then click option'
ACountVectorizer
BTfidfVectorizer
CSMOTE
DRandomOverSampler
Attempts:
3 left
💡 Hint
Common Mistakes
Using CountVectorizer which only counts words.
Trying to use oversampling classes here.
3fill in blank
hard

Fix the error in applying SMOTE to the feature matrix X and labels y.

NLP
smote = SMOTE(random_state=42)
X_resampled, y_resampled = smote.fit_resample([1], y)
Drag options to blanks, or click blank then click option'
AX
By
Csmote
Drandom_state
Attempts:
3 left
💡 Hint
Common Mistakes
Swapping the order of X and y.
Passing the SMOTE object instead of data.
4fill in blank
hard

Fill both blanks to create a balanced dataset using RandomOverSampler and convert text to features.

NLP
from imblearn.over_sampling import [1]
from sklearn.feature_extraction.text import [2]
ros = [1](random_state=0)
tfidf = [2](stop_words='english')
X = tfidf.fit_transform(texts)
X_resampled, y_resampled = ros.fit_resample(X, y)
Drag options to blanks, or click blank then click option'
ARandomOverSampler
BSMOTE
CTfidfVectorizer
DCountVectorizer
Attempts:
3 left
💡 Hint
Common Mistakes
Mixing SMOTE with RandomOverSampler.
Using CountVectorizer instead of TfidfVectorizer.
5fill in blank
hard

Fill all three blanks to create a pipeline that balances data and trains a logistic regression model.

NLP
from imblearn.pipeline import Pipeline
from sklearn.linear_model import [1]
from imblearn.over_sampling import [2]
from sklearn.feature_extraction.text import [3]
pipeline = Pipeline([
    ('vectorizer', [3](stop_words='english')),
    ('oversample', [2](random_state=42)),
    ('classifier', [1]())
])
Drag options to blanks, or click blank then click option'
ALogisticRegression
BRandomOverSampler
CTfidfVectorizer
DSMOTE
Attempts:
3 left
💡 Hint
Common Mistakes
Using SMOTE instead of RandomOverSampler in this pipeline.
Confusing CountVectorizer with TfidfVectorizer.