0
0
ML Pythonprogramming~5 mins

Stratified K-fold in ML Python - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What is Stratified K-fold cross-validation?
Stratified K-fold is a way to split data into K parts (folds) so that each fold has the same proportion of each class as the whole dataset. This helps keep the data balanced during training and testing.
Click to reveal answer
beginner
Why use Stratified K-fold instead of regular K-fold?
Regular K-fold may create folds with uneven class distributions, which can cause biased model evaluation. Stratified K-fold keeps class proportions consistent, giving more reliable results especially for imbalanced data.
Click to reveal answer
intermediate
How does Stratified K-fold handle imbalanced datasets?
It ensures each fold has roughly the same percentage of samples from each class as the full dataset, so minority classes are fairly represented in every fold.
Click to reveal answer
advanced
In Stratified K-fold, what happens if the number of samples in a class is less than the number of folds?
Some folds may not contain samples from that class because there aren't enough samples to distribute evenly. This can affect the stratification quality.
Click to reveal answer
beginner
Write a simple Python code snippet to perform Stratified K-fold cross-validation using scikit-learn.
from sklearn.model_selection import StratifiedKFold
import numpy as np

X = np.array([[1], [2], [3], [4], [5], [6]])
y = np.array([0, 0, 1, 1, 0, 1])

skf = StratifiedKFold(n_splits=3)
for train_index, test_index in skf.split(X, y):
    print("TRAIN:", train_index, "TEST:", test_index)
Click to reveal answer
What is the main goal of Stratified K-fold cross-validation?
ATo randomly shuffle data without any class consideration
BTo keep class proportions balanced in each fold
CTo increase the number of folds beyond the dataset size
DTo use only one fold for training
Which type of problem benefits most from Stratified K-fold?
AClassification with imbalanced classes
BRegression with continuous targets
CUnsupervised clustering
DData with no labels
If you have 5 folds, how does Stratified K-fold split the data?
AEach fold has random samples ignoring class
BEach fold has all samples of one class
CEach fold has roughly 1/5 of each class
DEach fold has only majority class samples
What happens if a class has fewer samples than the number of folds in Stratified K-fold?
ASome folds may not contain samples from that class
BThe class samples are duplicated to fill all folds
CThe class is ignored during splitting
DThe number of folds is automatically reduced
Which Python library provides StratifiedKFold for easy use?
ANumPy
BTensorFlow
CMatplotlib
Dscikit-learn
Explain how Stratified K-fold cross-validation works and why it is useful for classification problems.
Describe a situation where using Stratified K-fold is better than regular K-fold and why.