Bird
Raised Fist0
ML Pythonml~20 mins

Imbalanced class handling (SMOTE, class weights) in ML Python - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Imbalanced Data Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Why use SMOTE for imbalanced data?
Imagine you have a dataset where one class is much smaller than the other. Why would you use SMOTE (Synthetic Minority Over-sampling Technique) instead of just duplicating minority class samples?
ASMOTE creates new synthetic samples by mixing existing minority samples, which helps the model learn better decision boundaries.
BSMOTE removes majority class samples to balance the dataset, reducing training time.
CSMOTE duplicates minority samples exactly to increase their count without changing data distribution.
DSMOTE randomly deletes samples from both classes to balance the dataset.
Attempts:
2 left
💡 Hint
Think about how creating new data points can help the model generalize better than just copying existing ones.
Predict Output
intermediate
1:30remaining
Output of class weight usage in logistic regression
What will be the output of the following code snippet regarding the model's class weight attribute?
ML Python
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(class_weight={0:1, 1:5})
print(model.class_weight)
A{0: 1, 1: 5}
Bbalanced
CNone
DRaises TypeError
Attempts:
2 left
💡 Hint
Check what value is assigned to class_weight in the model initialization.
Metrics
advanced
2:00remaining
Choosing the best metric for imbalanced classification
You trained a model on a dataset with 95% of class 0 and 5% of class 1. Which metric is best to evaluate your model's performance on the minority class?
APrecision for class 1, to measure how many predicted positives are correct.
BMean Squared Error, because it measures prediction error.
CAccuracy, because it shows overall correct predictions.
DRecall for class 1, to measure how many actual positives are found.
Attempts:
2 left
💡 Hint
Think about which metric helps find most of the minority class samples.
🔧 Debug
advanced
2:30remaining
Why does this SMOTE code raise an error?
What error will this code raise and why? from imblearn.over_sampling import SMOTE X = [[1,2],[3,4],[5,6]] y = [0,0,1] smote = SMOTE(sampling_strategy='minority') X_res, y_res = smote.fit_resample(X, y)
AValueError: Expected 2D array, got 1D array instead.
BNo error, code runs successfully.
CValueError: At least 6 samples are needed to perform SMOTE.
DTypeError: 'list' object has no attribute 'fit'.
Attempts:
2 left
💡 Hint
SMOTE needs enough samples in the minority class to create synthetic samples.
Model Choice
expert
3:00remaining
Best approach for highly imbalanced multi-class classification
You have a multi-class dataset with 4 classes, where one class is only 1% of data. You want to improve model performance on the rare class. Which approach is best?
AApply SMOTE oversampling to all classes equally before training.
BUse class weights in the model to give higher importance to the rare class.
CRemove the rare class samples to simplify the problem.
DUse accuracy as the only metric to evaluate the model.
Attempts:
2 left
💡 Hint
Think about how to handle imbalance without losing data or misleading metrics.

Practice

(1/5)
1. What is the main purpose of using SMOTE in machine learning?
easy
A. To create synthetic samples for minority classes to balance the dataset
B. To reduce the size of the majority class by removing samples
C. To increase the number of features in the dataset
D. To randomly shuffle the dataset before training

Solution

  1. Step 1: Understand SMOTE's role in imbalanced data

    SMOTE stands for Synthetic Minority Over-sampling Technique and it creates new synthetic samples for the minority class.
  2. Step 2: Compare options with SMOTE's function

    Only To create synthetic samples for minority classes to balance the dataset correctly describes SMOTE's purpose to balance classes by adding synthetic minority samples.
  3. Final Answer:

    To create synthetic samples for minority classes to balance the dataset -> Option A
  4. Quick Check:

    SMOTE = Synthetic samples for minority [OK]
Hint: SMOTE = make new minority samples to balance [OK]
Common Mistakes:
  • Thinking SMOTE removes majority samples
  • Confusing SMOTE with feature engineering
  • Assuming SMOTE shuffles data
2. Which of the following is the correct way to set class weights in scikit-learn's LogisticRegression?
easy
A. LogisticRegression(class_weight='balanced')
B. LogisticRegression(weight_class='balanced')
C. LogisticRegression(classweights='balanced')
D. LogisticRegression(weights='balanced')

Solution

  1. Step 1: Recall scikit-learn parameter for class weights

    The correct parameter name is class_weight and it accepts 'balanced' to auto-adjust weights.
  2. Step 2: Match options with correct syntax

    Only LogisticRegression(class_weight='balanced') uses the exact parameter class_weight='balanced'.
  3. Final Answer:

    LogisticRegression(class_weight='balanced') -> Option A
  4. Quick Check:

    Parameter name is class_weight [OK]
Hint: Use class_weight='balanced' exactly in model init [OK]
Common Mistakes:
  • Using wrong parameter names like weight_class
  • Misspelling class_weight
  • Passing weights instead of class_weight
3. Given this code snippet using SMOTE, what will be the shape of X_resampled and y_resampled?
from imblearn.over_sampling import SMOTE
X = [[1], [2], [3], [4], [5], [6]]
y = [0, 0, 0, 1, 1, 1]
smote = SMOTE(random_state=42)
X_resampled, y_resampled = smote.fit_resample(X, y)
print(len(X_resampled), len(y_resampled))
medium
A. 8 8
B. 6 6
C. 10 10
D. 12 12

Solution

  1. Step 1: Count original class samples

    Class 0 has 3 samples, class 1 has 3 samples, so dataset is balanced initially.
  2. Step 2: Understand SMOTE behavior on balanced data

    SMOTE will create synthetic samples to balance minority class to majority class size. Here both classes are equal, so no new samples are needed.
  3. Step 3: Check actual output

    Since classes are equal, no new samples are added. So output length remains 6.
  4. Final Answer:

    6 6 -> Option B
  5. Quick Check:

    Balanced classes, no new samples added [OK]
Hint: SMOTE adds samples only if classes are imbalanced [OK]
Common Mistakes:
  • Assuming SMOTE always doubles data
  • Ignoring original class counts
  • Confusing sample count with feature count
4. You wrote this code to apply class weights but the model accuracy is very low. What is the likely error?
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(class_weight={'0':1, '1':10})
model.fit(X_train, y_train)
medium
A. LogisticRegression does not support class weights
B. class_weight parameter does not accept dictionaries
C. Class weights keys should be integers, not strings
D. class_weight values must sum to 1

Solution

  1. Step 1: Check class_weight dictionary keys

    Class labels in class_weight must match label types in y_train. Usually labels are integers 0 and 1, not strings '0' and '1'.
  2. Step 2: Understand impact of wrong keys

    If keys are strings but labels are integers, weights won't apply correctly, causing poor model performance.
  3. Final Answer:

    Class weights keys should be integers, not strings -> Option C
  4. Quick Check:

    Keys must match label types [OK]
Hint: Match class_weight keys to label data types exactly [OK]
Common Mistakes:
  • Using string keys instead of integer keys
  • Thinking class_weight can't be a dict
  • Believing weights must sum to 1
5. You have a dataset with 95% class 0 and 5% class 1. You want to train a model that handles this imbalance. Which approach is best to improve minority class recall?
hard
A. Train the model without any imbalance handling
B. Only use SMOTE without changing class weights
C. Only set class_weight='balanced' without oversampling
D. Use SMOTE to create synthetic minority samples and set class_weight='balanced' in the model

Solution

  1. Step 1: Understand dataset imbalance

    With 95% vs 5%, the minority class is very small and model may ignore it.
  2. Step 2: Combine SMOTE and class weights

    SMOTE creates synthetic minority samples to balance data, while class_weight='balanced' tells model to focus more on minority class during training.
  3. Step 3: Why combining is best

    Using both together improves minority recall better than using either alone or ignoring imbalance.
  4. Final Answer:

    Use SMOTE to create synthetic minority samples and set class_weight='balanced' in the model -> Option D
  5. Quick Check:

    Combine oversampling + class weights for best minority recall [OK]
Hint: Combine SMOTE and class_weight='balanced' for best results [OK]
Common Mistakes:
  • Using only one method and expecting best recall
  • Ignoring imbalance completely
  • Assuming oversampling alone fixes all issues