The requires_grad flag in PyTorch controls whether a tensor tracks operations for gradient calculation. It does not directly affect model accuracy or loss but impacts training effectiveness. Metrics like loss decrease and gradient norms matter because if requires_grad is False, gradients won't be computed, and the model won't learn. So, the key metric is whether gradients are correctly computed and used to update model weights.
requires_grad flag in PyTorch - Model Metrics & Evaluation
Since requires_grad is about gradient tracking, a confusion matrix is not applicable. Instead, consider this simple illustration of gradient flow:
Tensor A (requires_grad=True) --> Operations --> Gradient computed
Tensor B (requires_grad=False) --> Operations --> No gradient computed
If requires_grad=False, gradients are None, so no learning happens.
Think of requires_grad=True as turning on a learning mode. If you forget to set it, your model won't learn (like a student not paying attention). Setting it True everywhere might slow training or use more memory, like trying to learn everything at once.
Tradeoff:
- True: Model learns but uses more memory and compute.
- False: Saves memory but no learning happens for those tensors.
Use requires_grad=True for parameters you want to train, and False for fixed parts (like pretrained layers you don't want to change).
Good:
- Gradients are computed for all trainable parameters (non-zero gradient norms).
- Loss decreases steadily during training.
- Model weights update correctly.
Bad:
- Gradients are zero or None for parameters that should learn.
- Loss stays flat or does not improve.
- Model weights do not change after training steps.
- Forgetting to set
requires_grad=Trueon model parameters, so no learning happens. - Setting
requires_grad=Trueon tensors that should remain fixed, wasting memory and compute. - Mixing tensors with different
requires_gradflags causing unexpected gradient flow. - Not detaching tensors when needed, causing memory leaks.
- Assuming accuracy or loss alone shows requires_grad issues; always check gradients.
Your model has 98% accuracy but the gradients for some parameters are always zero because their requires_grad flag is False. Is this model good for training? Why or why not?
Answer: No, it is not good for training those parameters. Even if accuracy looks high, parameters with requires_grad=False won't update because no gradients are computed. This means the model can't improve those parts, which may limit learning or cause poor generalization.