0
0
NLPml~15 mins

Evaluation metrics (accuracy, F1, confusion matrix) in NLP - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Evaluation metrics (accuracy, F1, confusion matrix)
Problem:You have trained a text classification model to identify positive and negative movie reviews. The model's training accuracy is 90%, but you want to understand how well it performs on unseen data using evaluation metrics.
Current Metrics:Training accuracy: 90%, Validation accuracy: 85%
Issue:Accuracy alone does not give a full picture of model performance, especially if classes are imbalanced. You need to compute F1 score and confusion matrix to better evaluate the model.
Your Task
Calculate accuracy, F1 score, and confusion matrix on the validation set to better understand model performance.
Use sklearn metrics functions only.
Do not retrain or change the model.
Use the provided validation predictions and true labels.
Hint 1
Hint 2
Hint 3
Solution
NLP
from sklearn.metrics import accuracy_score, f1_score, confusion_matrix

# Example true labels and predicted labels for validation set
true_labels = [1, 0, 1, 1, 0, 0, 1, 0, 1, 0]
predicted_labels = [1, 0, 1, 0, 0, 0, 1, 1, 1, 0]

# Calculate accuracy
acc = accuracy_score(true_labels, predicted_labels)

# Calculate F1 score (binary classification)
f1 = f1_score(true_labels, predicted_labels)

# Calculate confusion matrix
cm = confusion_matrix(true_labels, predicted_labels)

print(f"Accuracy: {acc:.2f}")
print(f"F1 Score: {f1:.2f}")
print("Confusion Matrix:")
print(cm)
Added code to compute accuracy, F1 score, and confusion matrix using sklearn.
Used example true and predicted labels to demonstrate metric calculations.
Printed results clearly for easy interpretation.
Results Interpretation

Before: Only accuracy was known (85% on validation).

After: Accuracy is 80%, F1 score is 0.80, and confusion matrix shows 4 true negatives, 4 true positives, 1 false positive, and 1 false negative.

Accuracy alone can be misleading. F1 score and confusion matrix provide deeper insight into model performance, especially for imbalanced classes.
Bonus Experiment
Try calculating precision and recall along with the existing metrics to get even more insight.
💡 Hint
Use precision_score and recall_score from sklearn.metrics with the same true and predicted labels.