Overview - Zeroing gradients
What is it?
Zeroing gradients means setting all the gradient values of a model's parameters to zero before starting a new round of learning. Gradients are numbers that tell the model how to change its parameters to improve. Without zeroing, gradients from previous steps would mix with new ones, causing wrong updates. This step is essential in training models using methods like gradient descent.
Why it matters
Without zeroing gradients, the model would keep adding up old gradient values, leading to incorrect learning directions. This would make the model confused and slow to learn or even fail to learn. Zeroing gradients ensures each learning step starts fresh, allowing the model to improve correctly and efficiently. It is a small but critical step that keeps training stable and reliable.
Where it fits
Before zeroing gradients, you should understand what gradients are and how backpropagation works to compute them. After zeroing gradients, the next step is to update the model's parameters using the optimizer. This concept fits early in the training loop process and is foundational before learning advanced optimization techniques.