0
0
Prompt Engineering / GenAIml~20 mins

Automated evaluation metrics in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Automated evaluation metrics
Problem:You have trained a text classification model, but you only checked accuracy. You want to understand how well the model performs in different ways using automated evaluation metrics.
Current Metrics:Accuracy: 78%
Issue:Accuracy alone does not show if the model is good at finding all positive cases or if it makes many false alarms.
Your Task
Calculate precision, recall, and F1-score for the model predictions to better understand its performance.
Use sklearn.metrics functions only.
Do not change the model or data.
Use the given true labels and predicted labels.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
Prompt Engineering / GenAI
from sklearn.metrics import precision_score, recall_score, f1_score, classification_report

# Example true labels and predicted labels
true_labels = [0, 1, 1, 0, 1, 0, 1, 1, 0, 0]
pred_labels = [0, 1, 0, 0, 1, 0, 1, 0, 0, 1]

# Calculate precision
precision = precision_score(true_labels, pred_labels)

# Calculate recall
recall = recall_score(true_labels, pred_labels)

# Calculate F1-score
f1 = f1_score(true_labels, pred_labels)

# Print all metrics
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1-score: {f1:.2f}")

# Alternatively, print classification report
print("\nClassification Report:\n", classification_report(true_labels, pred_labels))
Added code to compute precision, recall, and F1-score using sklearn.
Used example true and predicted labels to demonstrate metric calculation.
Printed results clearly for easy understanding.
Results Interpretation

Before: Only accuracy was known (78%).

After: Precision is 75%, meaning 75% of predicted positives are correct. Recall is 60%, meaning the model found 60% of all actual positives. F1-score is 67%, balancing both.

Accuracy alone can be misleading. Using precision, recall, and F1-score gives a fuller picture of model performance, especially for imbalanced data or when false positives and false negatives have different costs.
Bonus Experiment
Try calculating these metrics for a multi-class classification problem using sklearn's classification_report.
💡 Hint
Use the 'target_names' parameter in classification_report to label classes and observe per-class metrics.