Challenge - 5 Problems

🎖️

Cross-validation Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

2:00remaining

Why use K-fold cross-validation?

Imagine you have a small dataset and want to estimate how well your machine learning model will perform on new data. Why is K-fold cross-validation a better choice than a single train-test split?

AIt uses all data points for both training and testing, reducing bias in performance estimates.

BIt trains the model multiple times on the same training set to improve accuracy.

CIt splits data randomly once, which is faster and more reliable than multiple splits.

DIt only tests the model on the largest portion of data to get the best score.

Attempts:

2 left

❓ Predict Output

intermediate

2:00remaining

Output of K-fold split indices

What will be the output of the following Python code that uses KFold from scikit-learn?

ML Python

from sklearn.model_selection import KFold
import numpy as np

X = np.array([10, 20, 30, 40, 50])
kf = KFold(n_splits=2, shuffle=False)

splits = []
for train_index, test_index in kf.split(X):
    splits.append((train_index.tolist(), test_index.tolist()))

print(splits)

A[([3, 4], [0, 1, 2]), ([0, 1, 2], [3, 4])]

B[([0, 1, 2], [3, 4]), ([3, 4], [0, 1, 2])]

C[([1, 2, 3], [0, 4]), ([0, 4], [1, 2, 3])]

D[([2, 3, 4], [0, 1]), ([0, 1], [2, 3, 4])]

Attempts:

2 left

❓ Model Choice

advanced

2:00remaining

Choosing K for K-fold cross-validation

You have a dataset with 1000 samples. You want to use K-fold cross-validation to estimate model performance. Which choice of K balances bias and variance best?

AK = 5, because it provides a good balance between bias and variance for many datasets.

BK = 1000, because leave-one-out cross-validation always gives the best estimate.

CK = 1, because using the whole dataset for training and testing is most accurate.

DK = 2, because fewer folds reduce computation time and variance.

Attempts:

2 left

❓ Metrics

advanced

2:00remaining

Calculating average accuracy from K-fold results

You performed 4-fold cross-validation and got these accuracy scores for each fold: [0.82, 0.85, 0.80, 0.83]. What is the correct average accuracy to report?

A0.8250

B0.83

C0.825

D0.83 with standard deviation 0.02

Attempts:

2 left

🔧 Debug

expert

2:00remaining

Why does this K-fold code raise an error?

Consider this Python code snippet using KFold. It raises an error. What is the cause?

ML Python

from sklearn.model_selection import KFold
import numpy as np

X = np.array([1, 2, 3])
kf = KFold(n_splits=5)

for train_index, test_index in kf.split(X):
    print('Train:', train_index, 'Test:', test_index)

AIndexError because test indices exceed array length.

BValueError because n_splits (5) cannot be greater than the number of samples (3).

CTypeError because KFold expects a list, not a numpy array.

DRuntimeError because KFold requires shuffle=true for small datasets.

Attempts:

2 left