When using a validation split, the key metrics to watch are validation loss and validation accuracy. These metrics tell us how well the model performs on data it has never seen during training. This helps us check if the model is learning patterns that work beyond just the training data. If validation loss decreases and validation accuracy increases, it means the model is generalizing well.
Validation split in TensorFlow - Model Metrics & Evaluation
For classification tasks, a confusion matrix on the validation set helps us see detailed performance:
Actual \ Predicted | Positive | Negative
-------------------|----------|---------
Positive | 40 | 10
Negative | 5 | 45
Here, True Positives (TP) = 40, False Negatives (FN) = 10, False Positives (FP) = 5, True Negatives (TN) = 45.
Validation split helps us measure both precision and recall on unseen data. For example, in a spam filter:
- High precision means few good emails are wrongly marked as spam.
- High recall means most spam emails are caught.
Depending on the goal, validation metrics guide us to tune the model to balance precision and recall before final testing.
Good: Validation accuracy close to training accuracy, and validation loss steadily decreasing or stable.
Bad: Validation accuracy much lower than training accuracy, or validation loss increasing while training loss decreases (sign of overfitting).
- Data leakage: If validation data leaks into training, validation metrics become too optimistic.
- Overfitting: Validation loss rising while training loss falls means model memorizes training data.
- Small validation set: Too small validation split can give noisy or unreliable metrics.
- Ignoring metric trends: Looking only at final accuracy without checking loss or other metrics can mislead.
Your model has 98% accuracy on training but only 12% recall on fraud cases in validation. Is it good?
Answer: No. The model misses most fraud cases (low recall), which is critical for fraud detection. Despite high accuracy, it is not reliable for production.