Recall & Review
beginner
What is the purpose of splitting data into train, validation, and test sets?
Splitting data helps us train the model on one part (train), tune settings on another (validation), and finally check how well it works on new data (test). This avoids cheating and helps make better models.
Click to reveal answer
beginner
In PyTorch, which function can help split datasets into train and validation sets?
You can use
torch.utils.data.random_split() to split a dataset into parts like train and validation sets randomly.Click to reveal answer
beginner
Why should the test set be kept separate and only used once?
The test set shows how the model performs on new, unseen data. Using it only once prevents accidentally tuning the model to the test data, which would give a false sense of accuracy.
Click to reveal answer
beginner
What typical proportions are used for train, validation, and test splits?
Common splits are 70-80% for training, 10-15% for validation, and 10-15% for testing. These can vary depending on data size and problem.
Click to reveal answer
beginner
How does validation data help during model training?
Validation data helps check the model's performance during training to adjust settings like learning rate or stop training early to avoid overfitting.
Click to reveal answer
What is the main use of the validation set?
✗ Incorrect
The validation set is used to tune model settings like hyperparameters during training.
Which PyTorch function helps split datasets randomly?
✗ Incorrect
torch.utils.data.random_split() splits datasets randomly into parts.Why should the test set be used only once?
✗ Incorrect
Using the test set only once prevents tuning the model to test data, ensuring honest evaluation.
A common split for train/validation/test is:
✗ Incorrect
80% train, 10% validation, and 10% test is a common and balanced split.
What happens if you train and test on the same data?
✗ Incorrect
Training and testing on the same data causes overfitting, so the model may fail on new data.
Explain why we split data into train, validation, and test sets and how each is used.
Think about how to avoid cheating and improve model reliability.
You got /4 concepts.
Describe how to split a PyTorch dataset into train and validation sets and why random splitting is useful.
Consider how to fairly divide data for training and tuning.
You got /3 concepts.