PyTorchml~8 mins

Train/val/test split in PyTorch - Model Metrics & Evaluation

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Metrics & Evaluation - Train/val/test split

Which metric matters for Train/val/test split and WHY

When we split data into train, validation, and test sets, the main goal is to check how well our model learns and how well it will work on new data.

Training metrics (like loss and accuracy) show how well the model learns from the training data.

Validation metrics help us tune the model and avoid overfitting by checking performance on unseen data during training.

Test metrics give a final unbiased estimate of how the model will perform in the real world.

So, the key metrics are accuracy, loss, precision, recall, or others depending on the task, but measured separately on train, val, and test sets to understand model behavior.

Confusion matrix example for validation set

      | Predicted Positive | Predicted Negative |
      |--------------------|--------------------|
      | True Positive (TP): 50 | False Negative (FN): 10 |
      | False Positive (FP): 5 | True Negative (TN): 35 |

      Total samples in val set = TP + FN + FP + TN = 50 + 10 + 5 + 35 = 100

This matrix helps calculate precision, recall, and accuracy on the validation set.

Precision vs Recall tradeoff in validation

Imagine a spam email detector:

High precision means most emails marked as spam really are spam (few good emails wrongly blocked).
High recall means most spam emails are caught (few spam emails missed).

During validation, we adjust the model to balance precision and recall depending on what matters more.

Train/val/test split lets us test these tradeoffs on validation data before final testing.

Good vs Bad metric values for train/val/test split

Good:

Training accuracy: 90%
Validation accuracy: 88%
Test accuracy: 87%
Close values show the model generalizes well.

Bad:

Training accuracy: 95%
Validation accuracy: 60%
Test accuracy: 58%
Big gap means overfitting; model memorizes training but fails on new data.

Common pitfalls with train/val/test split metrics

Data leakage: If test data leaks into training, test metrics look too good and mislead.
Imbalanced splits: Unequal class distribution in splits can give wrong metric impressions.
Overfitting: High train but low val/test metrics show model memorizes instead of learning.
Small validation/test sets: Too few samples make metrics unstable and unreliable.

Self-check question

Your model has 98% accuracy on training but only 12% recall on fraud cases in validation. Is it good for production?

Answer: No. The model misses most fraud cases (low recall), which is dangerous. Despite high training accuracy, it does not generalize well. You need to improve recall on validation before trusting the model.

Key Result

Train/val/test split metrics reveal if the model learns well and generalizes; watch for gaps indicating overfitting or data issues.