0
0
TensorFlowml~8 mins

Transfer learning for small datasets in TensorFlow - Model Metrics & Evaluation

Choose your learning style9 modes available
Metrics & Evaluation - Transfer learning for small datasets
Which metric matters and WHY

When using transfer learning on small datasets, accuracy is often used to see how well the model predicts. But accuracy alone can be misleading if classes are unbalanced.

We also focus on precision and recall because small datasets can cause the model to miss important cases (low recall) or make many wrong positive predictions (low precision).

F1 score helps balance precision and recall, giving a better overall picture.

Confusion Matrix Example
      Actual \ Predicted | Positive | Negative
      -------------------|----------|---------
      Positive           |    40    |   10
      Negative           |    5     |   45

      Total samples = 100
    

From this matrix:

  • True Positives (TP) = 40
  • False Negatives (FN) = 10
  • False Positives (FP) = 5
  • True Negatives (TN) = 45

Precision = 40 / (40 + 5) = 0.89

Recall = 40 / (40 + 10) = 0.80

Accuracy = (40 + 45) / 100 = 0.85

F1 Score = 2 * (0.89 * 0.80) / (0.89 + 0.80) ≈ 0.84

Precision vs Recall Tradeoff

In transfer learning with small data, you might see:

  • High precision, low recall: Model is careful and only predicts positive when very sure, but misses many real positives. Good when false alarms are costly.
  • High recall, low precision: Model finds most positives but also many false alarms. Good when missing positives is costly.

Example: If you use transfer learning to detect rare diseases from few images, high recall is important to catch all sick patients, even if some healthy are flagged wrongly.

Good vs Bad Metric Values

Good:

  • Accuracy above 80% on small data with transfer learning
  • Precision and recall both above 75%
  • F1 score balanced and close to precision and recall

Bad:

  • High accuracy but very low recall (e.g., 98% accuracy but 10% recall)
  • Precision very low, causing many false positives
  • F1 score much lower than precision or recall, showing imbalance
Common Pitfalls
  • Accuracy paradox: High accuracy can hide poor performance on minority classes in small datasets.
  • Data leakage: Using test data during training inflates metrics falsely.
  • Overfitting: Transfer learning can overfit small data if too many layers are retrained.
  • Ignoring class imbalance: Not adjusting for rare classes can mislead metric interpretation.
Self Check

Your transfer learning model on a small dataset has 98% accuracy but only 12% recall on the positive class (e.g., fraud). Is it good for production?

Answer: No. The model misses most positive cases (low recall), which is critical in fraud detection. Despite high accuracy, it fails to catch fraud effectively.

Key Result
For transfer learning on small datasets, balance precision and recall using F1 score to ensure reliable predictions beyond just accuracy.