MlopsHow-ToBeginner · 4 min read

How to Evaluate Classification Model in Python with sklearn

To evaluate a classification model in Python, use sklearn.metrics functions such as accuracy_score, precision_score, recall_score, and confusion_matrix. These metrics compare the model's predicted labels with the true labels to measure performance.

📐

Syntax

Here are common sklearn functions to evaluate classification models:

accuracy_score(y_true, y_pred): Returns the fraction of correct predictions.
precision_score(y_true, y_pred): Measures how many predicted positives are actually positive.
recall_score(y_true, y_pred): Measures how many actual positives were correctly predicted.
confusion_matrix(y_true, y_pred): Shows counts of true positives, false positives, true negatives, and false negatives.

python

from sklearn.metrics import accuracy_score, precision_score, recall_score, confusion_matrix

# y_true: true labels
# y_pred: predicted labels

accuracy = accuracy_score(y_true, y_pred)
precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
cm = confusion_matrix(y_true, y_pred)

💻

Example

This example shows how to train a simple classifier and evaluate it using accuracy, precision, recall, and confusion matrix.

python

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, confusion_matrix

# Load data
iris = load_iris()
X = iris.data
# Use only two classes for binary classification
y = (iris.target == 0).astype(int)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train model
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)

# Evaluate
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)

print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print("Confusion Matrix:\n", cm)

Output

Accuracy: 1.00 Precision: 1.00 Recall: 1.00 Confusion Matrix: [[16 0] [ 0 19]]

⚠️

Common Pitfalls

Common mistakes when evaluating classification models include:

Using accuracy alone on imbalanced data can be misleading.
Not specifying average parameter for precision/recall in multi-class problems.
Confusing predicted labels with probabilities; metrics require labels.
Evaluating on training data instead of separate test data.

python

from sklearn.metrics import precision_score

# Wrong: Using precision_score without average on multi-class data
# y_true = [0, 1, 2, 2]
# y_pred = [0, 2, 1, 2]
# precision_score(y_true, y_pred)  # This raises an error

# Right: Specify average='macro' or 'weighted'
# precision_score(y_true, y_pred, average='macro')

📊

Quick Reference

Metric	Description	Function	Notes
Accuracy	Fraction of correct predictions	accuracy_score(y_true, y_pred)	Good for balanced classes
Precision	Correct positive predictions / total predicted positives	precision_score(y_true, y_pred)	Use average param for multi-class
Recall	Correct positive predictions / total actual positives	recall_score(y_true, y_pred)	Use average param for multi-class
Confusion Matrix	Counts of TP, FP, TN, FN	confusion_matrix(y_true, y_pred)	Shows detailed error types

✅

Key Takeaways

Use sklearn.metrics functions like accuracy_score, precision_score, recall_score, and confusion_matrix to evaluate classification models.

Always evaluate on separate test data to get unbiased performance estimates.

Accuracy can be misleading on imbalanced datasets; consider precision and recall.

Specify the average parameter for precision and recall when working with multi-class classification.

Confusion matrix helps understand types of prediction errors in detail.