0
0
ML Pythonprogramming~5 mins

Classification evaluation (accuracy, precision, recall, F1) in ML Python

Choose your learning style9 modes available
Introduction

We use classification evaluation to check how well a model guesses categories. It helps us know if the model is good or needs improvement.

When you want to see how many emails your spam filter correctly marks as spam or not spam.
When checking if a medical test correctly identifies sick and healthy patients.
When evaluating if a model correctly classifies images of cats and dogs.
When comparing different models to pick the best one for sorting customer reviews as positive or negative.
Syntax
ML Python
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

accuracy = accuracy_score(true_labels, predicted_labels)
precision = precision_score(true_labels, predicted_labels)
recall = recall_score(true_labels, predicted_labels)
f1 = f1_score(true_labels, predicted_labels)

accuracy_score measures overall correct guesses.

precision_score measures how many predicted positives are actually positive.

recall_score measures how many actual positives were found.

f1_score balances precision and recall into one number.

Examples
Calculates accuracy for 4 predictions.
ML Python
accuracy = accuracy_score([1,0,1,1], [1,0,0,1])
Calculates precision for binary labels.
ML Python
precision = precision_score([1,0,1,1], [1,0,0,1])
Calculates recall for binary labels.
ML Python
recall = recall_score([1,0,1,1], [1,0,0,1])
Calculates F1 score for binary labels.
ML Python
f1 = f1_score([1,0,1,1], [1,0,0,1])
Sample Program

This program compares true labels and predicted labels, then prints four common evaluation scores to see how well the model did.

ML Python
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# True labels (actual categories)
true_labels = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]

# Predicted labels by the model
predicted_labels = [1, 0, 1, 0, 0, 1, 1, 0, 1, 0]

# Calculate evaluation metrics
accuracy = accuracy_score(true_labels, predicted_labels)
precision = precision_score(true_labels, predicted_labels)
recall = recall_score(true_labels, predicted_labels)
f1 = f1_score(true_labels, predicted_labels)

print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1 Score: {f1:.2f}")
OutputSuccess
Important Notes

Accuracy can be misleading if classes are imbalanced (one class is much bigger).

Precision is important when false positives are costly (e.g., wrongly flagging emails as spam).

Recall is important when missing positives is costly (e.g., missing sick patients).

Summary

Accuracy shows overall correct predictions.

Precision and recall focus on positive class quality.

F1 score balances precision and recall into one number.