0
0
PyTorchml~20 mins

Train/val/test split in PyTorch - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Train/Val/Test Split Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of PyTorch dataset split code
What will be the output of the following code snippet that splits a dataset into train, validation, and test sets using PyTorch's random_split?
PyTorch
from torch.utils.data import random_split
from torch.utils.data import Dataset

class DummyDataset(Dataset):
    def __init__(self, length):
        self.data = list(range(length))
    def __len__(self):
        return len(self.data)
    def __getitem__(self, idx):
        return self.data[idx]

full_dataset = DummyDataset(100)
train_size = 70
val_size = 20
test_size = 10
train_set, val_set, test_set = random_split(full_dataset, [train_size, val_size, test_size])

print(len(train_set), len(val_set), len(test_set))
A33 33 34
B100 0 0
C70 20 10
D50 25 25
Attempts:
2 left
💡 Hint
Check how the sizes are passed to random_split and what lengths are printed.
🧠 Conceptual
intermediate
1:30remaining
Purpose of validation set in train/val/test split
What is the main purpose of the validation set in a train/val/test split during machine learning model training?
ATo tune hyperparameters and prevent overfitting
BTo train the model with more data
CTo evaluate the model's performance on unseen data after training
DTo test the model's final accuracy before deployment
Attempts:
2 left
💡 Hint
Think about which set helps decide model settings without biasing final evaluation.
Hyperparameter
advanced
2:00remaining
Choosing split ratios for train/val/test
Which of the following split ratios is most appropriate for a dataset with 10,000 samples to ensure reliable training, validation, and testing?
A60% train, 20% validation, 20% test
B50% train, 25% validation, 25% test
C90% train, 5% validation, 5% test
D80% train, 10% validation, 10% test
Attempts:
2 left
💡 Hint
Consider enough data for training and meaningful validation/testing.
🔧 Debug
advanced
2:00remaining
Identify error in PyTorch dataset splitting code
What error will the following code raise when trying to split a dataset into train, validation, and test sets?
PyTorch
from torch.utils.data import random_split
full_dataset = list(range(50))
train_set, val_set, test_set = random_split(full_dataset, [30, 15, 10])
ATypeError: 'list' object has no attribute '__len__'
BValueError: Sum of input lengths does not equal the length of the input dataset
CTypeError: random_split expects a Dataset object, not a list
DNo error, splits successfully
Attempts:
2 left
💡 Hint
Check the sum of split sizes compared to dataset length.
Model Choice
expert
2:30remaining
Best approach to split highly imbalanced dataset
You have a highly imbalanced classification dataset with 1% positive and 99% negative samples. Which approach is best to split the dataset into train, validation, and test sets to maintain class distribution?
AUse stratified splitting to keep class proportions in each subset
BManually shuffle and split the dataset randomly
CUse random_split from PyTorch directly without stratification
DSplit only into train and test, skip validation
Attempts:
2 left
💡 Hint
Think about preserving class balance in each subset for fair evaluation.