NLPml~5 mins

Evaluation metrics (accuracy, F1, confusion matrix) in NLP

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Introduction

We use evaluation metrics to see how well a model is doing. They help us understand if the model makes good predictions or not.

Checking if a spam filter correctly identifies spam emails.

Measuring how well a model classifies positive and negative movie reviews.

Evaluating a model that detects diseases from medical reports.

Comparing different models to pick the best one for a task.

Understanding mistakes a model makes to improve it.

Syntax

NLP

from sklearn.metrics import accuracy_score, f1_score, confusion_matrix

# accuracy = accuracy_score(true_labels, predicted_labels)
# f1 = f1_score(true_labels, predicted_labels, average='binary')
# cm = confusion_matrix(true_labels, predicted_labels)

accuracy_score measures the percentage of correct predictions.

f1_score balances precision and recall, useful when classes are uneven.

Examples

Calculate accuracy, F1 score, and confusion matrix for a small example.

NLP

accuracy = accuracy_score([1,0,1,1], [1,0,0,1])
f1 = f1_score([1,0,1,1], [1,0,0,1])
cm = confusion_matrix([1,0,1,1], [1,0,0,1])

Calculate F1 score for multi-class data using macro average.

NLP

f1_macro = f1_score([0,1,2,2], [0,2,1,2], average='macro')

Sample Model

This program shows how to calculate accuracy, F1 score, and confusion matrix for a simple binary classification example.

NLP

from sklearn.metrics import accuracy_score, f1_score, confusion_matrix

# True labels for 8 samples
true_labels = [0, 1, 0, 1, 0, 1, 1, 0]
# Model predictions
predicted_labels = [0, 0, 0, 1, 0, 1, 0, 1]

# Calculate accuracy
accuracy = accuracy_score(true_labels, predicted_labels)
# Calculate F1 score (binary)
f1 = f1_score(true_labels, predicted_labels)
# Calculate confusion matrix
cm = confusion_matrix(true_labels, predicted_labels)

print(f"Accuracy: {accuracy:.2f}")
print(f"F1 Score: {f1:.2f}")
print("Confusion Matrix:")
print(cm)

OutputSuccess

Important Notes

Accuracy can be misleading if classes are imbalanced.

F1 score is better when you care about both false positives and false negatives.

The confusion matrix shows counts of true negatives, false positives, false negatives, and true positives.

Summary

Accuracy tells how many predictions were correct overall.

F1 score balances precision and recall, useful for uneven classes.

Confusion matrix helps see where the model makes mistakes.

Practice

(1/5)

1. What does the accuracy metric measure in a classification model?

easy

A. The proportion of correct predictions out of all predictions

B. The balance between precision and recall

C. The number of false positives only

D. The total number of classes in the dataset

Evaluation metrics (accuracy, F1, confusion matrix) in NLP

Start learning this pattern below

Practice

Solution

Step 1: Understand accuracy definition

Step 2: Compare options with definition

Final Answer:

Quick Check:

Solution

Step 1: Recall F1 score formula

Step 2: Match formula with options

Final Answer:

Quick Check:

Solution

Step 1: Identify confusion matrix values

Step 2: Calculate accuracy

Final Answer:

Quick Check:

Solution

Step 1: Recall precision formula

Step 2: Match formula with options

Final Answer:

Quick Check:

Solution

Step 1: Recall F1 score formula

Step 2: Calculate F1 score

Final Answer:

Quick Check: