0
0
ML Pythonprogramming~5 mins

Train-test split in ML Python - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What is the purpose of a train-test split in machine learning?
It is used to divide data into two parts: one for training the model and one for testing its performance on new, unseen data.
Click to reveal answer
beginner
What typical ratio is used for train-test split?
A common ratio is 70% to 80% of data for training and 20% to 30% for testing.
Click to reveal answer
beginner
Why should the test set be separate from the training set?
To check if the model can generalize well to new data and avoid overfitting to the training data.
Click to reveal answer
beginner
What Python library and function is commonly used for train-test splitting?
The scikit-learn library with the function train_test_split from sklearn.model_selection.
Click to reveal answer
intermediate
How does random_state parameter affect train-test split?
It sets a seed for random shuffling so the split is reproducible and consistent across runs.
Click to reveal answer
What is the main goal of splitting data into train and test sets?
ATo evaluate model performance on unseen data
BTo increase training data size
CTo reduce the number of features
DTo speed up model training
Which of these is a common train-test split ratio?
A50% train, 50% test
B90% train, 10% test
C70% train, 30% test
D10% train, 90% test
What does the random_state parameter do in train_test_split?
ASets the size of the test set
BControls random shuffling for reproducibility
CSpecifies the model type
DNormalizes the data
Why should the test set not be used during training?
ATo avoid overfitting and get an unbiased evaluation
BIt helps tune hyperparameters
CIt contains only noise
DIt is smaller than the train set
Which Python function is used to split data into train and test sets?
Asplit_data()
Bdata_split()
Ctrain_test()
Dtrain_test_split()
Explain why train-test split is important in machine learning.
Describe how you would perform a train-test split using Python.