How to Evaluate Classification Model in Python with sklearn
To evaluate a classification model in Python, use
sklearn.metrics functions such as accuracy_score, precision_score, recall_score, and confusion_matrix. These metrics compare the model's predicted labels with the true labels to measure performance.Syntax
Here are common sklearn functions to evaluate classification models:
accuracy_score(y_true, y_pred): Returns the fraction of correct predictions.precision_score(y_true, y_pred): Measures how many predicted positives are actually positive.recall_score(y_true, y_pred): Measures how many actual positives were correctly predicted.confusion_matrix(y_true, y_pred): Shows counts of true positives, false positives, true negatives, and false negatives.
python
from sklearn.metrics import accuracy_score, precision_score, recall_score, confusion_matrix # y_true: true labels # y_pred: predicted labels accuracy = accuracy_score(y_true, y_pred) precision = precision_score(y_true, y_pred) recall = recall_score(y_true, y_pred) cm = confusion_matrix(y_true, y_pred)
Example
This example shows how to train a simple classifier and evaluate it using accuracy, precision, recall, and confusion matrix.
python
from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score, precision_score, recall_score, confusion_matrix # Load data iris = load_iris() X = iris.data # Use only two classes for binary classification y = (iris.target == 0).astype(int) # Split data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Train model model = LogisticRegression(max_iter=200) model.fit(X_train, y_train) # Predict y_pred = model.predict(X_test) # Evaluate accuracy = accuracy_score(y_test, y_pred) precision = precision_score(y_test, y_pred) recall = recall_score(y_test, y_pred) cm = confusion_matrix(y_test, y_pred) print(f"Accuracy: {accuracy:.2f}") print(f"Precision: {precision:.2f}") print(f"Recall: {recall:.2f}") print("Confusion Matrix:\n", cm)
Output
Accuracy: 1.00
Precision: 1.00
Recall: 1.00
Confusion Matrix:
[[16 0]
[ 0 19]]
Common Pitfalls
Common mistakes when evaluating classification models include:
- Using accuracy alone on imbalanced data can be misleading.
- Not specifying
averageparameter for precision/recall in multi-class problems. - Confusing predicted labels with probabilities; metrics require labels.
- Evaluating on training data instead of separate test data.
python
from sklearn.metrics import precision_score # Wrong: Using precision_score without average on multi-class data # y_true = [0, 1, 2, 2] # y_pred = [0, 2, 1, 2] # precision_score(y_true, y_pred) # This raises an error # Right: Specify average='macro' or 'weighted' # precision_score(y_true, y_pred, average='macro')
Quick Reference
| Metric | Description | Function | Notes |
|---|---|---|---|
| Accuracy | Fraction of correct predictions | accuracy_score(y_true, y_pred) | Good for balanced classes |
| Precision | Correct positive predictions / total predicted positives | precision_score(y_true, y_pred) | Use average param for multi-class |
| Recall | Correct positive predictions / total actual positives | recall_score(y_true, y_pred) | Use average param for multi-class |
| Confusion Matrix | Counts of TP, FP, TN, FN | confusion_matrix(y_true, y_pred) | Shows detailed error types |
Key Takeaways
Use sklearn.metrics functions like accuracy_score, precision_score, recall_score, and confusion_matrix to evaluate classification models.
Always evaluate on separate test data to get unbiased performance estimates.
Accuracy can be misleading on imbalanced datasets; consider precision and recall.
Specify the average parameter for precision and recall when working with multi-class classification.
Confusion matrix helps understand types of prediction errors in detail.