Recall & Review

beginner

What is the purpose of a train-test split in machine learning?

It is used to divide data into two parts: one for training the model and one for testing its performance on new, unseen data.

Click to reveal answer

beginner

What typical ratio is used for train-test split?

A common ratio is 70% to 80% of data for training and 20% to 30% for testing.

Click to reveal answer

beginner

Why should the test set be separate from the training set?

To check if the model can generalize well to new data and avoid overfitting to the training data.

Click to reveal answer

beginner

What Python library and function is commonly used for train-test splitting?

The scikit-learn library with the function train_test_split from sklearn.model_selection.

Click to reveal answer

intermediate

How does random_state parameter affect train-test split?

It sets a seed for random shuffling so the split is reproducible and consistent across runs.

Click to reveal answer

What is the main goal of splitting data into train and test sets?

ATo evaluate model performance on unseen data

BTo increase training data size

CTo reduce the number of features

DTo speed up model training

Which of these is a common train-test split ratio?

A50% train, 50% test

B90% train, 10% test

C70% train, 30% test

D10% train, 90% test

What does the random_state parameter do in train_test_split?

ASets the size of the test set

BControls random shuffling for reproducibility

CSpecifies the model type

DNormalizes the data

Why should the test set not be used during training?

ATo avoid overfitting and get an unbiased evaluation

BIt helps tune hyperparameters

CIt contains only noise

DIt is smaller than the train set

Which Python function is used to split data into train and test sets?

Asplit_data()

Bdata_split()

Ctrain_test()

Dtrain_test_split()

Explain why train-test split is important in machine learning.

Describe how you would perform a train-test split using Python.