When training on multiple GPUs, the main goal is to speed up training without losing model quality. So, the key metrics are training time per epoch and model accuracy or loss. We want to see if the model learns just as well (accuracy/loss) but faster (time). If accuracy drops, the speed gain is not useful. If training time does not improve, multi-GPU is not effective.
Multi-GPU training in PyTorch - Model Metrics & Evaluation
Multi-GPU training itself does not change the confusion matrix of the model. The confusion matrix shows how well the model predicts classes:
| Predicted Positive | Predicted Negative |
|--------------------|--------------------|
| True Positive (TP) | False Negative (FN) |
| False Positive (FP) | True Negative (TN) |
We compare confusion matrices from single-GPU and multi-GPU training to check if predictions stay consistent.
Multi-GPU training aims to keep model quality the same while speeding up training. If the model trained on multiple GPUs has lower precision or recall, it means the training process might have issues like poor synchronization or batch size effects.
Example: If a cancer detector model trained on one GPU has 90% recall, but on multiple GPUs recall drops to 80%, it means the multi-GPU setup hurt the model's ability to find cancer cases. This is bad even if training is faster.
Good:
- Training time per epoch decreases significantly (e.g., from 60 minutes to 20 minutes).
- Model accuracy and loss remain similar or improve slightly.
- Precision and recall stay stable compared to single-GPU training.
Bad:
- Training time does not improve or improves very little.
- Model accuracy drops noticeably (e.g., from 85% to 75%).
- Precision or recall decrease, indicating worse predictions.
- Training becomes unstable or loss fluctuates wildly.
- Data leakage: If data is not properly split or shuffled across GPUs, the model may see the same data multiple times, inflating accuracy.
- Batch size effects: Multi-GPU training often increases batch size, which can affect learning dynamics and model quality.
- Overfitting: Faster training might cause the model to overfit if early stopping or validation checks are not used.
- Synchronization issues: Improper gradient synchronization can cause model divergence and poor metrics.
- Accuracy paradox: High accuracy with imbalanced data can be misleading; always check precision and recall.
Your multi-GPU model trains 3 times faster than single-GPU but has 98% accuracy and only 12% recall on fraud detection. Is it good for production? Why or why not?
Answer: No, it is not good. Even though accuracy is high, recall is very low. This means the model misses most fraud cases, which is dangerous. For fraud detection, high recall is critical to catch as many frauds as possible. Speed alone does not make the model useful.