Challenge - 5 Problems
Train/Val/Test Split Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of PyTorch dataset split code
What will be the output of the following code snippet that splits a dataset into train, validation, and test sets using PyTorch's random_split?
PyTorch
from torch.utils.data import random_split from torch.utils.data import Dataset class DummyDataset(Dataset): def __init__(self, length): self.data = list(range(length)) def __len__(self): return len(self.data) def __getitem__(self, idx): return self.data[idx] full_dataset = DummyDataset(100) train_size = 70 val_size = 20 test_size = 10 train_set, val_set, test_set = random_split(full_dataset, [train_size, val_size, test_size]) print(len(train_set), len(val_set), len(test_set))
Attempts:
2 left
💡 Hint
Check how the sizes are passed to random_split and what lengths are printed.
✗ Incorrect
The random_split function divides the dataset into subsets with the exact sizes provided. Here, 70, 20, and 10 sum to 100, so the lengths printed are 70, 20, and 10 respectively.
🧠 Conceptual
intermediate1:30remaining
Purpose of validation set in train/val/test split
What is the main purpose of the validation set in a train/val/test split during machine learning model training?
Attempts:
2 left
💡 Hint
Think about which set helps decide model settings without biasing final evaluation.
✗ Incorrect
The validation set is used during training to tune hyperparameters and monitor if the model is overfitting. It is separate from the test set, which is used only for final evaluation.
❓ Hyperparameter
advanced2:00remaining
Choosing split ratios for train/val/test
Which of the following split ratios is most appropriate for a dataset with 10,000 samples to ensure reliable training, validation, and testing?
Attempts:
2 left
💡 Hint
Consider enough data for training and meaningful validation/testing.
✗ Incorrect
80/10/10 is a common split that balances enough training data with sufficient validation and test samples for reliable evaluation.
🔧 Debug
advanced2:00remaining
Identify error in PyTorch dataset splitting code
What error will the following code raise when trying to split a dataset into train, validation, and test sets?
PyTorch
from torch.utils.data import random_split full_dataset = list(range(50)) train_set, val_set, test_set = random_split(full_dataset, [30, 15, 10])
Attempts:
2 left
💡 Hint
Check the sum of split sizes compared to dataset length.
✗ Incorrect
The sum of split sizes is 30+15+10=55, which is greater than the dataset length 50, causing a ValueError.
❓ Model Choice
expert2:30remaining
Best approach to split highly imbalanced dataset
You have a highly imbalanced classification dataset with 1% positive and 99% negative samples. Which approach is best to split the dataset into train, validation, and test sets to maintain class distribution?
Attempts:
2 left
💡 Hint
Think about preserving class balance in each subset for fair evaluation.
✗ Incorrect
Stratified splitting ensures each subset has the same class distribution as the full dataset, which is crucial for imbalanced data.