0
0
Ml-pythonHow-ToBeginner ยท 3 min read

What Metrics to Monitor for ML Model Performance and Health

To monitor an ML model, track loss and accuracy during training for overall performance. For classification, also monitor precision, recall, and F1-score to understand error types. For regression, use mean squared error (MSE) or mean absolute error (MAE) to measure prediction quality.
๐Ÿ“

Syntax

Common metrics for ML models are calculated using simple function calls or methods from libraries like scikit-learn or TensorFlow.

  • loss: Measures how far predictions are from true values during training.
  • accuracy: Percentage of correct predictions.
  • precision: How many predicted positives are actually positive.
  • recall: How many actual positives are correctly predicted.
  • F1-score: Harmonic mean of precision and recall.
  • mean squared error (MSE): Average squared difference between predicted and actual values.
  • mean absolute error (MAE): Average absolute difference between predicted and actual values.
python
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, mean_squared_error, mean_absolute_error

# Example predictions and true labels
y_true = [0, 1, 1, 0, 1]
y_pred = [0, 0, 1, 0, 1]

# Classification metrics
acc = accuracy_score(y_true, y_pred)
prec = precision_score(y_true, y_pred)
rec = recall_score(y_true, y_pred)
f1 = f1_score(y_true, y_pred)

# Regression example
y_true_reg = [3.0, -0.5, 2.0, 7.0]
y_pred_reg = [2.5, 0.0, 2.1, 7.8]
mse = mean_squared_error(y_true_reg, y_pred_reg)
mae = mean_absolute_error(y_true_reg, y_pred_reg)
๐Ÿ’ป

Example

This example shows how to calculate key classification and regression metrics using scikit-learn. It demonstrates how to interpret model predictions with these metrics.

python
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, mean_squared_error, mean_absolute_error

# Classification example
true_labels = [1, 0, 1, 1, 0, 0, 1]
pred_labels = [1, 0, 0, 1, 0, 1, 1]

accuracy = accuracy_score(true_labels, pred_labels)
precision = precision_score(true_labels, pred_labels)
recall = recall_score(true_labels, pred_labels)
f1 = f1_score(true_labels, pred_labels)

print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1-score: {f1:.2f}")

# Regression example
true_values = [2.5, 0.0, 2.1, 7.8]
pred_values = [3.0, -0.5, 2.0, 7.0]
mse = mean_squared_error(true_values, pred_values)
mae = mean_absolute_error(true_values, pred_values)

print(f"Mean Squared Error: {mse:.2f}")
print(f"Mean Absolute Error: {mae:.2f}")
Output
Accuracy: 0.71 Precision: 0.75 Recall: 0.60 F1-score: 0.67 Mean Squared Error: 0.38 Mean Absolute Error: 0.50
โš ๏ธ

Common Pitfalls

Many beginners make these mistakes when monitoring ML metrics:

  • Relying only on accuracy for imbalanced data, which can be misleading.
  • Ignoring precision and recall when false positives or false negatives matter.
  • Not monitoring loss during training to detect overfitting or underfitting.
  • Using regression metrics on classification tasks or vice versa.
  • Failing to track metrics on validation/test data, only on training data.
python
from sklearn.metrics import accuracy_score, precision_score

# Wrong: Using accuracy only on imbalanced data
true = [0, 0, 0, 0, 1]
pred = [0, 0, 0, 0, 0]
acc = accuracy_score(true, pred)
prec = precision_score(true, pred, zero_division=0)
print(f"Accuracy: {acc:.2f}")  # High accuracy but model misses positive
print(f"Precision: {prec:.2f}")  # Precision shows problem

# Right: Use precision and recall together
from sklearn.metrics import recall_score
rec = recall_score(true, pred, zero_division=0)
print(f"Recall: {rec:.2f}")
Output
Accuracy: 0.80 Precision: 0.00 Recall: 0.00
๐Ÿ“Š

Quick Reference

Here is a quick summary of key ML metrics to monitor:

MetricTypeWhat it MeasuresWhen to Use
LossTrainingHow well model fits training dataAlways during training
AccuracyClassificationPercent correct predictionsBalanced classes
PrecisionClassificationCorrect positive predictionsWhen false positives are costly
RecallClassificationDetected actual positivesWhen missing positives is costly
F1-scoreClassificationBalance of precision and recallImbalanced classes
Mean Squared Error (MSE)RegressionAverage squared errorRegression tasks
Mean Absolute Error (MAE)RegressionAverage absolute errorRegression tasks
โœ…

Key Takeaways

Always monitor loss and accuracy during training to track model learning.
Use precision, recall, and F1-score for classification, especially with imbalanced data.
For regression, track mean squared error and mean absolute error to measure prediction quality.
Avoid relying on a single metric; combine multiple metrics for a full picture.
Monitor metrics on validation/test data to detect overfitting or poor generalization.