TensorFlowml~8 mins

Transfer learning for small datasets in TensorFlow - Model Metrics & Evaluation

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Metrics & Evaluation - Transfer learning for small datasets

Which metric matters and WHY

When using transfer learning on small datasets, accuracy is often used to see how well the model predicts. But accuracy alone can be misleading if classes are unbalanced.

We also focus on precision and recall because small datasets can cause the model to miss important cases (low recall) or make many wrong positive predictions (low precision).

F1 score helps balance precision and recall, giving a better overall picture.

Confusion Matrix Example

      Actual \ Predicted | Positive | Negative
      -------------------|----------|---------
      Positive           |    40    |   10
      Negative           |    5     |   45

      Total samples = 100

From this matrix:

True Positives (TP) = 40
False Negatives (FN) = 10
False Positives (FP) = 5
True Negatives (TN) = 45

Precision = 40 / (40 + 5) = 0.89

Recall = 40 / (40 + 10) = 0.80

Accuracy = (40 + 45) / 100 = 0.85

F1 Score = 2 * (0.89 * 0.80) / (0.89 + 0.80) ≈ 0.84

Precision vs Recall Tradeoff

In transfer learning with small data, you might see:

High precision, low recall: Model is careful and only predicts positive when very sure, but misses many real positives. Good when false alarms are costly.
High recall, low precision: Model finds most positives but also many false alarms. Good when missing positives is costly.

Example: If you use transfer learning to detect rare diseases from few images, high recall is important to catch all sick patients, even if some healthy are flagged wrongly.

Good vs Bad Metric Values

Good:

Accuracy above 80% on small data with transfer learning
Precision and recall both above 75%
F1 score balanced and close to precision and recall

Bad:

High accuracy but very low recall (e.g., 98% accuracy but 10% recall)
Precision very low, causing many false positives
F1 score much lower than precision or recall, showing imbalance

Common Pitfalls

Accuracy paradox: High accuracy can hide poor performance on minority classes in small datasets.
Data leakage: Using test data during training inflates metrics falsely.
Overfitting: Transfer learning can overfit small data if too many layers are retrained.
Ignoring class imbalance: Not adjusting for rare classes can mislead metric interpretation.

Self Check

Your transfer learning model on a small dataset has 98% accuracy but only 12% recall on the positive class (e.g., fraud). Is it good for production?

Answer: No. The model misses most positive cases (low recall), which is critical in fraud detection. Despite high accuracy, it fails to catch fraud effectively.

Key Result

For transfer learning on small datasets, balance precision and recall using F1 score to ensure reliable predictions beyond just accuracy.