How to Compare ML Models in Python Using sklearn
To compare machine learning models in Python, use
sklearn to train each model on the same data, then evaluate their performance using metrics like accuracy_score or mean_squared_error. You can compare results side-by-side to choose the best model for your task.Syntax
To compare ML models, follow these steps:
fit(): Train each model on training data.predict(): Get predictions on test data.- Use evaluation metrics like
accuracy_scorefor classification ormean_squared_errorfor regression. - Compare metric values to decide which model performs better.
python
from sklearn.metrics import accuracy_score # Train model model.fit(X_train, y_train) # Predict predictions = model.predict(X_test) # Evaluate score = accuracy_score(y_test, predictions)
Example
This example compares two classification models, Logistic Regression and Decision Tree, on the Iris dataset using accuracy score.
python
from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import accuracy_score # Load data iris = load_iris() X, y = iris.data, iris.target # Split data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Initialize models log_reg = LogisticRegression(max_iter=200) dec_tree = DecisionTreeClassifier() # Train models log_reg.fit(X_train, y_train) dec_tree.fit(X_train, y_train) # Predict pred_log_reg = log_reg.predict(X_test) pred_dec_tree = dec_tree.predict(X_test) # Evaluate acc_log_reg = accuracy_score(y_test, pred_log_reg) acc_dec_tree = accuracy_score(y_test, pred_dec_tree) print(f"Logistic Regression Accuracy: {acc_log_reg:.2f}") print(f"Decision Tree Accuracy: {acc_dec_tree:.2f}")
Output
Logistic Regression Accuracy: 1.00
Decision Tree Accuracy: 0.98
Common Pitfalls
Common mistakes when comparing ML models include:
- Using different train/test splits for each model, which makes comparison unfair.
- Comparing models with different evaluation metrics that don't fit the task.
- Ignoring randomness by not setting
random_state, causing inconsistent results. - Overfitting by evaluating on training data instead of separate test data.
python
from sklearn.model_selection import train_test_split # Wrong: Different splits for each model X_train1, X_test1, y_train1, y_test1 = train_test_split(X, y, test_size=0.3, random_state=42) X_train2, X_test2, y_train2, y_test2 = train_test_split(X, y, test_size=0.3, random_state=42) # Right: Use same split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
Quick Reference
| Step | Description | Example Function |
|---|---|---|
| Train model | Fit model on training data | model.fit(X_train, y_train) |
| Predict | Get predictions on test data | model.predict(X_test) |
| Evaluate | Calculate performance metric | accuracy_score(y_test, y_pred) |
| Compare | Check metric values side-by-side | Compare accuracy or error values |
Key Takeaways
Train all models on the same train/test split for fair comparison.
Use appropriate metrics like accuracy for classification or MSE for regression.
Set random_state to ensure reproducible splits and results.
Evaluate models on unseen test data to avoid overfitting bias.
Compare metric scores side-by-side to select the best model.