ML Pythonprogramming~5 mins

Classification evaluation (accuracy, precision, recall, F1) in ML Python

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Introduction

We use classification evaluation to check how well a model guesses categories. It helps us know if the model is good or needs improvement.

When you want to see how many emails your spam filter correctly marks as spam or not spam.

When checking if a medical test correctly identifies sick and healthy patients.

When evaluating if a model correctly classifies images of cats and dogs.

When comparing different models to pick the best one for sorting customer reviews as positive or negative.

Syntax

ML Python

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

accuracy = accuracy_score(true_labels, predicted_labels)
precision = precision_score(true_labels, predicted_labels)
recall = recall_score(true_labels, predicted_labels)
f1 = f1_score(true_labels, predicted_labels)

accuracy_score measures overall correct guesses.

precision_score measures how many predicted positives are actually positive.

recall_score measures how many actual positives were found.

f1_score balances precision and recall into one number.

Examples

Calculates accuracy for 4 predictions.

ML Python

accuracy = accuracy_score([1,0,1,1], [1,0,0,1])

Calculates precision for binary labels.

ML Python

precision = precision_score([1,0,1,1], [1,0,0,1])

Calculates recall for binary labels.

ML Python

recall = recall_score([1,0,1,1], [1,0,0,1])

Calculates F1 score for binary labels.

ML Python

f1 = f1_score([1,0,1,1], [1,0,0,1])

Sample Program

This program compares true labels and predicted labels, then prints four common evaluation scores to see how well the model did.

ML Python

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# True labels (actual categories)
true_labels = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]

# Predicted labels by the model
predicted_labels = [1, 0, 1, 0, 0, 1, 1, 0, 1, 0]

# Calculate evaluation metrics
accuracy = accuracy_score(true_labels, predicted_labels)
precision = precision_score(true_labels, predicted_labels)
recall = recall_score(true_labels, predicted_labels)
f1 = f1_score(true_labels, predicted_labels)

print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1 Score: {f1:.2f}")

OutputSuccess

Important Notes

Accuracy can be misleading if classes are imbalanced (one class is much bigger).

Precision is important when false positives are costly (e.g., wrongly flagging emails as spam).

Recall is important when missing positives is costly (e.g., missing sick patients).

Summary

Accuracy shows overall correct predictions.

Precision and recall focus on positive class quality.

F1 score balances precision and recall into one number.