beginner

What is the purpose of splitting data into train, validation, and test sets?

Splitting data helps us train the model on one part (train), tune settings on another (validation), and finally check how well it works on new data (test). This avoids cheating and helps make better models.

Click to reveal answer

beginner

In PyTorch, which function can help split datasets into train and validation sets?

You can use torch.utils.data.random_split() to split a dataset into parts like train and validation sets randomly.

Click to reveal answer

beginner

Why should the test set be kept separate and only used once?

The test set shows how the model performs on new, unseen data. Using it only once prevents accidentally tuning the model to the test data, which would give a false sense of accuracy.

Click to reveal answer

beginner

What typical proportions are used for train, validation, and test splits?

Common splits are 70-80% for training, 10-15% for validation, and 10-15% for testing. These can vary depending on data size and problem.

Click to reveal answer

beginner

How does validation data help during model training?

Validation data helps check the model's performance during training to adjust settings like learning rate or stop training early to avoid overfitting.

Click to reveal answer

What is the main use of the validation set?

ATo collect more data

BTo train the model

CTo test the final model performance

DTo tune model settings during training

Which PyTorch function helps split datasets randomly?

Atorch.split()

Btorch.utils.data.random_split()

Ctorch.chunk()

Dtorch.data.split_dataset()

Why should the test set be used only once?

ATo avoid overfitting to test data

BTo speed up training

CTo save memory

DTo increase training data size

A common split for train/validation/test is:

A80% train, 10% val, 10% test

B90% train, 5% val, 5% test

C30% train, 30% val, 40% test

D50% train, 25% val, 25% test

What happens if you train and test on the same data?

ATraining is faster

BModel performs well on new data

CModel may overfit and perform poorly on new data

DModel accuracy decreases on training data

Explain why we split data into train, validation, and test sets and how each is used.

Describe how to split a PyTorch dataset into train and validation sets and why random splitting is useful.