What if your perfect model is just fooling you? Discover how to truly test it!
Why Train-test split in ML Python? - Purpose & Use Cases
Imagine you built a model to predict house prices. You test it on the same data you trained it with. It looks perfect! But when you show it new houses, it fails badly.
Testing on training data hides mistakes. It's like studying answers and then taking the test. You don't know if your model really learned or just memorized. This leads to wrong confidence and poor real-world results.
Train-test split solves this by keeping some data aside. The model learns on one part (train) and is tested on unseen data (test). This shows how well it will perform on new, real data.
model.fit(data, labels) predictions = model.predict(data)
from sklearn.model_selection import train_test_split train_data, test_data, train_labels, test_labels = train_test_split(data, labels) model.fit(train_data, train_labels) predictions = model.predict(test_data)
It enables honest evaluation of your model's true ability to predict new data.
Before launching a spam filter, companies test it on emails it never saw. This ensures it catches new spam, not just old examples.
Testing on training data gives false confidence.
Train-test split keeps data separate for honest testing.
This helps build models that work well in the real world.