0
0
MlopsHow-ToBeginner · 4 min read

How to Evaluate Classification Model in Python with sklearn

To evaluate a classification model in Python, use sklearn.metrics functions such as accuracy_score, precision_score, recall_score, and confusion_matrix. These metrics compare the model's predicted labels with the true labels to measure performance.
📐

Syntax

Here are common sklearn functions to evaluate classification models:

  • accuracy_score(y_true, y_pred): Returns the fraction of correct predictions.
  • precision_score(y_true, y_pred): Measures how many predicted positives are actually positive.
  • recall_score(y_true, y_pred): Measures how many actual positives were correctly predicted.
  • confusion_matrix(y_true, y_pred): Shows counts of true positives, false positives, true negatives, and false negatives.
python
from sklearn.metrics import accuracy_score, precision_score, recall_score, confusion_matrix

# y_true: true labels
# y_pred: predicted labels

accuracy = accuracy_score(y_true, y_pred)
precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
cm = confusion_matrix(y_true, y_pred)
💻

Example

This example shows how to train a simple classifier and evaluate it using accuracy, precision, recall, and confusion matrix.

python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, confusion_matrix

# Load data
iris = load_iris()
X = iris.data
# Use only two classes for binary classification
y = (iris.target == 0).astype(int)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train model
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)

# Evaluate
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)

print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print("Confusion Matrix:\n", cm)
Output
Accuracy: 1.00 Precision: 1.00 Recall: 1.00 Confusion Matrix: [[16 0] [ 0 19]]
⚠️

Common Pitfalls

Common mistakes when evaluating classification models include:

  • Using accuracy alone on imbalanced data can be misleading.
  • Not specifying average parameter for precision/recall in multi-class problems.
  • Confusing predicted labels with probabilities; metrics require labels.
  • Evaluating on training data instead of separate test data.
python
from sklearn.metrics import precision_score

# Wrong: Using precision_score without average on multi-class data
# y_true = [0, 1, 2, 2]
# y_pred = [0, 2, 1, 2]
# precision_score(y_true, y_pred)  # This raises an error

# Right: Specify average='macro' or 'weighted'
# precision_score(y_true, y_pred, average='macro')
📊

Quick Reference

MetricDescriptionFunctionNotes
AccuracyFraction of correct predictionsaccuracy_score(y_true, y_pred)Good for balanced classes
PrecisionCorrect positive predictions / total predicted positivesprecision_score(y_true, y_pred)Use average param for multi-class
RecallCorrect positive predictions / total actual positivesrecall_score(y_true, y_pred)Use average param for multi-class
Confusion MatrixCounts of TP, FP, TN, FNconfusion_matrix(y_true, y_pred)Shows detailed error types

Key Takeaways

Use sklearn.metrics functions like accuracy_score, precision_score, recall_score, and confusion_matrix to evaluate classification models.
Always evaluate on separate test data to get unbiased performance estimates.
Accuracy can be misleading on imbalanced datasets; consider precision and recall.
Specify the average parameter for precision and recall when working with multi-class classification.
Confusion matrix helps understand types of prediction errors in detail.