0
0
ML Pythonprogramming~20 mins

Train-test split in ML Python - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Train-Test Split Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Why do we use a train-test split in machine learning?

Choose the best reason why splitting data into training and testing sets is important.

ATo evaluate how well the model performs on unseen data.
BTo increase the size of the training data for better learning.
CTo reduce the number of features in the dataset.
DTo speed up the training process by using less data.
Attempts:
2 left
Predict Output
intermediate
2:00remaining
Output of train-test split sizes

What will be the output sizes of training and testing sets after this code runs?

ML Python
from sklearn.model_selection import train_test_split
X = list(range(100))
y = [x * 2 for x in X]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
print(len(X_train), len(X_test))
A75 25
B25 75
C70 30
D80 20
Attempts:
2 left
Hyperparameter
advanced
2:00remaining
Choosing the test_size parameter

Which test_size value is best if you want to maximize training data but still have a reliable test set?

A0.05
B0.2
C0.5
D0.8
Attempts:
2 left
🔧 Debug
advanced
2:00remaining
Identify the error in this train-test split code

What error will this code raise?

ML Python
from sklearn.model_selection import train_test_split
X = [1, 2, 3]
y = [4, 5]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
ANo error, code runs successfully
BTypeError: test_size must be an integer or float
CSyntaxError: invalid syntax
DValueError: Found input variables with inconsistent numbers of samples
Attempts:
2 left
Model Choice
expert
3:00remaining
Best practice for train-test split with imbalanced classes

You have a classification dataset with very imbalanced classes. Which train-test split approach is best to keep class proportions consistent?

ASplit data manually by slicing the dataset in order.
BRandomly split data without stratify to avoid bias.
CUse train_test_split with stratify parameter set to the target labels.
DUse only training data and skip testing to avoid imbalance issues.
Attempts:
2 left