Zeroing gradients is a step in training neural networks to make sure the model learns correctly. It is not a metric itself but affects how well metrics like loss and accuracy improve. Without zeroing gradients, the model adds up old gradient values, causing wrong updates. This leads to poor training results and bad metrics.
Zeroing gradients in PyTorch - Model Metrics & Evaluation
Zeroing gradients does not have a confusion matrix. Instead, think of it like clearing a whiteboard before writing new notes. If you don't erase old notes, new notes mix with old ones and cause confusion.
Before zeroing gradients:
Gradients = Gradients from previous batch + Gradients from current batch
After zeroing gradients:
Gradients = Gradients from current batch only
Zeroing gradients affects training stability, which in turn affects metrics like precision and recall. If gradients are not zeroed, the model may learn incorrectly, causing both precision and recall to drop. Proper zeroing helps the model improve both metrics steadily.
Example: In a spam detector, if gradients are not zeroed, the model might confuse spam and non-spam emails, lowering precision (more false positives) and recall (missing spam emails).
Good training metrics after zeroing gradients show steady loss decrease and rising accuracy, precision, and recall. Bad metrics show unstable or no improvement, indicating gradient issues.
- Good: Loss decreases each batch, accuracy rises smoothly.
- Bad: Loss jumps up and down, accuracy does not improve or worsens.
- Not zeroing gradients: Causes gradients to accumulate wrongly, leading to bad model updates.
- Data leakage: Mixing training and test data can hide gradient problems.
- Overfitting indicators: If training metrics improve but validation metrics worsen, check if gradients are handled properly.
- Accuracy paradox: High accuracy with poor gradient handling can be misleading if the model is not learning correctly.
Your model has 98% accuracy but 12% recall on fraud detection. Is it good for production? Why not?
Answer: No, it is not good. The low recall means the model misses most fraud cases, which is dangerous. Even with high accuracy, the model fails to catch fraud. This could be due to poor training steps like not zeroing gradients properly, causing bad learning.