Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is an imbalanced class problem in machine learning?
It happens when one class has many more examples than another, making the model biased toward the bigger class.
Click to reveal answer
beginner
What does SMOTE stand for and what does it do?
SMOTE means Synthetic Minority Over-sampling Technique. It creates new synthetic examples for the smaller class to balance the data.
Click to reveal answer
beginner
How do class weights help with imbalanced classes?
Class weights tell the model to pay more attention to the smaller class by making mistakes on it costlier during training.
Click to reveal answer
intermediate
When should you prefer SMOTE over class weights?
Use SMOTE when you want to increase data size by adding synthetic samples. Use class weights when you want to keep data as is but adjust training focus.
Click to reveal answer
intermediate
What is a potential risk of using SMOTE?
SMOTE can create noisy or unrealistic samples if the minority class is very small or complex, which may confuse the model.
Click to reveal answer
What problem does SMOTE solve?
AToo few examples in the minority class
BToo many features in the dataset
COverfitting on training data
DMissing values in data
✗ Incorrect
SMOTE creates synthetic samples to increase the number of minority class examples.
How do class weights affect model training?
AThey increase the learning rate
BThey balance the dataset by adding samples
CThey reduce the number of features
DThey make errors on minority class more costly
✗ Incorrect
Class weights assign higher penalty to mistakes on minority class to balance learning.
Which method adds new data points to balance classes?
ASMOTE
BClass weights
CFeature scaling
DCross-validation
✗ Incorrect
SMOTE generates synthetic samples to increase minority class size.
What is a downside of using SMOTE?
AIt reduces model accuracy
BIt removes minority class samples
CIt can create unrealistic samples
DIt ignores the majority class
✗ Incorrect
SMOTE may create noisy or unrealistic synthetic data if not used carefully.
When might class weights be preferred over SMOTE?
AWhen you want to add synthetic samples
BWhen you want to keep original data unchanged
CWhen dataset is perfectly balanced
DWhen you want to reduce training time
✗ Incorrect
Class weights adjust training focus without changing the data.
Explain how SMOTE works and why it helps with imbalanced classes.
Think about how adding new examples can help the model see more minority class data.
You got /4 concepts.
Describe how class weights influence model training on imbalanced data.
Consider how the model treats mistakes differently for each class.
You got /4 concepts.
Practice
(1/5)
1. What is the main purpose of using SMOTE in machine learning?
easy
A. To create synthetic samples for minority classes to balance the dataset
B. To reduce the size of the majority class by removing samples
C. To increase the number of features in the dataset
D. To randomly shuffle the dataset before training
Solution
Step 1: Understand SMOTE's role in imbalanced data
SMOTE stands for Synthetic Minority Over-sampling Technique and it creates new synthetic samples for the minority class.
Step 2: Compare options with SMOTE's function
Only To create synthetic samples for minority classes to balance the dataset correctly describes SMOTE's purpose to balance classes by adding synthetic minority samples.
Final Answer:
To create synthetic samples for minority classes to balance the dataset -> Option A
Quick Check:
SMOTE = Synthetic samples for minority [OK]
Hint: SMOTE = make new minority samples to balance [OK]
Common Mistakes:
Thinking SMOTE removes majority samples
Confusing SMOTE with feature engineering
Assuming SMOTE shuffles data
2. Which of the following is the correct way to set class weights in scikit-learn's LogisticRegression?
easy
A. LogisticRegression(class_weight='balanced')
B. LogisticRegression(weight_class='balanced')
C. LogisticRegression(classweights='balanced')
D. LogisticRegression(weights='balanced')
Solution
Step 1: Recall scikit-learn parameter for class weights
The correct parameter name is class_weight and it accepts 'balanced' to auto-adjust weights.
Step 2: Match options with correct syntax
Only LogisticRegression(class_weight='balanced') uses the exact parameter class_weight='balanced'.
Final Answer:
LogisticRegression(class_weight='balanced') -> Option A
Quick Check:
Parameter name is class_weight [OK]
Hint: Use class_weight='balanced' exactly in model init [OK]
Common Mistakes:
Using wrong parameter names like weight_class
Misspelling class_weight
Passing weights instead of class_weight
3. Given this code snippet using SMOTE, what will be the shape of X_resampled and y_resampled?
from imblearn.over_sampling import SMOTE
X = [[1], [2], [3], [4], [5], [6]]
y = [0, 0, 0, 1, 1, 1]
smote = SMOTE(random_state=42)
X_resampled, y_resampled = smote.fit_resample(X, y)
print(len(X_resampled), len(y_resampled))
medium
A. 8 8
B. 6 6
C. 10 10
D. 12 12
Solution
Step 1: Count original class samples
Class 0 has 3 samples, class 1 has 3 samples, so dataset is balanced initially.
Step 2: Understand SMOTE behavior on balanced data
SMOTE will create synthetic samples to balance minority class to majority class size. Here both classes are equal, so no new samples are needed.
Step 3: Check actual output
Since classes are equal, no new samples are added. So output length remains 6.
Final Answer:
6 6 -> Option B
Quick Check:
Balanced classes, no new samples added [OK]
Hint: SMOTE adds samples only if classes are imbalanced [OK]
Common Mistakes:
Assuming SMOTE always doubles data
Ignoring original class counts
Confusing sample count with feature count
4. You wrote this code to apply class weights but the model accuracy is very low. What is the likely error?
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(class_weight={'0':1, '1':10})
model.fit(X_train, y_train)
medium
A. LogisticRegression does not support class weights
B. class_weight parameter does not accept dictionaries
C. Class weights keys should be integers, not strings
D. class_weight values must sum to 1
Solution
Step 1: Check class_weight dictionary keys
Class labels in class_weight must match label types in y_train. Usually labels are integers 0 and 1, not strings '0' and '1'.
Step 2: Understand impact of wrong keys
If keys are strings but labels are integers, weights won't apply correctly, causing poor model performance.
Final Answer:
Class weights keys should be integers, not strings -> Option C
Quick Check:
Keys must match label types [OK]
Hint: Match class_weight keys to label data types exactly [OK]
Common Mistakes:
Using string keys instead of integer keys
Thinking class_weight can't be a dict
Believing weights must sum to 1
5. You have a dataset with 95% class 0 and 5% class 1. You want to train a model that handles this imbalance. Which approach is best to improve minority class recall?
hard
A. Train the model without any imbalance handling
B. Only use SMOTE without changing class weights
C. Only set class_weight='balanced' without oversampling
D. Use SMOTE to create synthetic minority samples and set class_weight='balanced' in the model
Solution
Step 1: Understand dataset imbalance
With 95% vs 5%, the minority class is very small and model may ignore it.
Step 2: Combine SMOTE and class weights
SMOTE creates synthetic minority samples to balance data, while class_weight='balanced' tells model to focus more on minority class during training.
Step 3: Why combining is best
Using both together improves minority recall better than using either alone or ignoring imbalance.
Final Answer:
Use SMOTE to create synthetic minority samples and set class_weight='balanced' in the model -> Option D
Quick Check:
Combine oversampling + class weights for best minority recall [OK]
Hint: Combine SMOTE and class_weight='balanced' for best results [OK]