0
0
ML Pythonprogramming~20 mins

Stratified K-fold in ML Python - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Stratified K-fold Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Why use Stratified K-fold instead of regular K-fold?

Imagine you have a dataset with two classes: 90% are class A and 10% are class B. You want to split the data into 5 folds for cross-validation.

Why is Stratified K-fold better than regular K-fold in this case?

AIt ensures each fold has approximately the same percentage of samples of each class as the whole dataset.
BIt randomly shuffles data without considering class distribution, which is faster.
CIt creates folds with only one class each to simplify training.
DIt duplicates minority class samples to balance the dataset before splitting.
Attempts:
2 left
Predict Output
intermediate
2:00remaining
Output of StratifiedKFold split indices

Given this code, what is the output of the printed train indices for the first fold?

ML Python
from sklearn.model_selection import StratifiedKFold
import numpy as np

X = np.array([[i] for i in range(10)])
y = np.array([0,0,0,0,1,1,1,1,1,1])

skf = StratifiedKFold(n_splits=2, shuffle=False)

for fold, (train_index, test_index) in enumerate(skf.split(X, y)):
    if fold == 0:
        print(train_index.tolist())
A[2, 3, 7, 8, 9]
B[0, 1, 2, 3, 4]
C[5, 6, 7, 8, 9]
D[0, 1, 2, 3, 4, 5]
Attempts:
2 left
Model Choice
advanced
2:00remaining
Choosing the best cross-validation method for imbalanced data

You have a dataset with 95% of samples in class 0 and 5% in class 1. You want to evaluate a classification model's performance reliably.

Which cross-validation method is best to use?

ALeave-One-Out cross-validation
BRandom train-test split without cross-validation
CStratified K-fold cross-validation
DRegular K-fold cross-validation without stratification
Attempts:
2 left
Hyperparameter
advanced
2:00remaining
Effect of increasing n_splits in StratifiedKFold

What is the effect of increasing the number of splits (n_splits) in StratifiedKFold on the training and validation sets?

ABoth training and validation sets remain the same size regardless of n_splits.
BTraining sets become smaller and validation sets become smaller, leading to overfitting.
CTraining sets become smaller and validation sets become larger, increasing variance in performance estimates.
DTraining sets become larger and validation sets become smaller, reducing bias in performance estimates.
Attempts:
2 left
🔧 Debug
expert
2:00remaining
Why does this StratifiedKFold code raise an error?

Consider this code snippet:

from sklearn.model_selection import StratifiedKFold
import numpy as np

X = np.array([[i] for i in range(6)])
y = np.array([0, 0, 1, 1, 1, 1])

skf = StratifiedKFold(n_splits=3)

for train_index, test_index in skf.split(X, y):
    print("TRAIN:", train_index, "TEST:", test_index)

Running this code raises a ValueError. What is the cause?

AThe input arrays X and y have mismatched lengths.
BThe number of splits is greater than the number of members in the smallest class.
CStratifiedKFold requires shuffle=True when n_splits > 2.
DThe labels y contain non-integer values which StratifiedKFold cannot handle.
Attempts:
2 left