PyTorchml~8 mins

Mixed precision training (AMP) in PyTorch - Model Metrics & Evaluation

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Metrics & Evaluation - Mixed precision training (AMP)

Which metric matters for Mixed Precision Training (AMP) and WHY

Mixed precision training uses both 16-bit and 32-bit numbers to speed up training and save memory. The key metrics to watch are training speed (time per epoch), model accuracy, and memory usage. We want to keep accuracy close to full precision training while improving speed and reducing memory. So, accuracy tells us if the model still learns well, speed shows efficiency, and memory usage shows resource savings.

Confusion Matrix or Equivalent Visualization

Mixed precision training itself does not change the confusion matrix of the model predictions. However, to check if AMP affects model quality, compare confusion matrices from full precision and mixed precision models.

Full Precision Confusion Matrix:
| TP=90 | FP=10 |
| FN=15 | TN=85 |

Mixed Precision Confusion Matrix:
| TP=89 | FP=11 |
| FN=16 | TN=84 |

Total samples = 200

These matrices show very similar results, meaning AMP kept model quality.

Precision vs Recall Tradeoff with AMP

AMP can slightly affect precision and recall because of small numerical differences. For example, if precision drops from 0.90 to 0.89 and recall from 0.86 to 0.85, this is usually acceptable given the speed and memory gains.

Think of it like using a faster car that uses less fuel but might be slightly less smooth. The small tradeoff is worth it for big efficiency gains.

What "Good" vs "Bad" Metric Values Look Like for AMP

Good: Accuracy within 1% of full precision, training speed improved by 20% or more, memory usage reduced by 30% or more.

Bad: Accuracy drops more than 3%, training speed gain is minimal, or memory savings are negligible. This means AMP is hurting model quality or not efficient.

Common Pitfalls in Metrics with AMP

Ignoring small accuracy drops that accumulate over epochs.
Confusing speed improvements from AMP with other optimizations.
Not checking memory usage properly, missing out on AMP benefits.
Overfitting signs can be hidden if only speed is monitored.
Data leakage or bugs can cause misleading accuracy results, unrelated to AMP.

Self Check: Your model has 98% accuracy but 12% recall on fraud. Is it good?

No, it is not good for fraud detection. Even though accuracy is high, recall is very low. This means the model misses most fraud cases, which is dangerous. For fraud, high recall is critical to catch as many frauds as possible, even if some false alarms happen.

Key Result

Mixed precision training should keep accuracy close to full precision while improving speed and reducing memory use.