0
0
NLPml~5 mins

Evaluation metrics (accuracy, F1, confusion matrix) in NLP

Choose your learning style9 modes available
Introduction

We use evaluation metrics to see how well a model is doing. They help us understand if the model makes good predictions or not.

Checking if a spam filter correctly identifies spam emails.
Measuring how well a model classifies positive and negative movie reviews.
Evaluating a model that detects diseases from medical reports.
Comparing different models to pick the best one for a task.
Understanding mistakes a model makes to improve it.
Syntax
NLP
from sklearn.metrics import accuracy_score, f1_score, confusion_matrix

# accuracy = accuracy_score(true_labels, predicted_labels)
# f1 = f1_score(true_labels, predicted_labels, average='binary')
# cm = confusion_matrix(true_labels, predicted_labels)

accuracy_score measures the percentage of correct predictions.

f1_score balances precision and recall, useful when classes are uneven.

Examples
Calculate accuracy, F1 score, and confusion matrix for a small example.
NLP
accuracy = accuracy_score([1,0,1,1], [1,0,0,1])
f1 = f1_score([1,0,1,1], [1,0,0,1])
cm = confusion_matrix([1,0,1,1], [1,0,0,1])
Calculate F1 score for multi-class data using macro average.
NLP
f1_macro = f1_score([0,1,2,2], [0,2,1,2], average='macro')
Sample Model

This program shows how to calculate accuracy, F1 score, and confusion matrix for a simple binary classification example.

NLP
from sklearn.metrics import accuracy_score, f1_score, confusion_matrix

# True labels for 8 samples
true_labels = [0, 1, 0, 1, 0, 1, 1, 0]
# Model predictions
predicted_labels = [0, 0, 0, 1, 0, 1, 0, 1]

# Calculate accuracy
accuracy = accuracy_score(true_labels, predicted_labels)
# Calculate F1 score (binary)
f1 = f1_score(true_labels, predicted_labels)
# Calculate confusion matrix
cm = confusion_matrix(true_labels, predicted_labels)

print(f"Accuracy: {accuracy:.2f}")
print(f"F1 Score: {f1:.2f}")
print("Confusion Matrix:")
print(cm)
OutputSuccess
Important Notes

Accuracy can be misleading if classes are imbalanced.

F1 score is better when you care about both false positives and false negatives.

The confusion matrix shows counts of true negatives, false positives, false negatives, and true positives.

Summary

Accuracy tells how many predictions were correct overall.

F1 score balances precision and recall, useful for uneven classes.

Confusion matrix helps see where the model makes mistakes.