What is .grad in PyTorch: Explanation and Usage
.grad in PyTorch is an attribute of a tensor that stores the gradient (derivative) of a scalar output with respect to that tensor. It is used during backpropagation to hold the computed gradients needed for updating model parameters.How It Works
Imagine you are trying to find out how changing one ingredient in a recipe affects the final taste. In machine learning, this is like finding how changing a number (a tensor) affects the final result (loss). The .grad attribute in PyTorch holds this information, called the gradient.
When you run backpropagation, PyTorch calculates gradients automatically and stores them in the .grad attribute of each tensor that requires gradients. This is like writing down how much each ingredient should be adjusted to improve the recipe.
These gradients are then used by optimization algorithms to update the model’s parameters, helping the model learn from data.
Example
This example shows how to create a tensor, perform a simple operation, run backpropagation, and access the .grad attribute to see the gradient.
import torch # Create a tensor with gradient tracking enabled x = torch.tensor(2.0, requires_grad=True) # Define a simple function y = x^2 y = x ** 2 # Compute gradients (dy/dx) y.backward() # Print the gradient stored in x.grad print(x.grad)
When to Use
You use .grad when you want to know how a change in a tensor affects a result, especially during training machine learning models. It is essential for updating model weights to minimize errors.
For example, in neural networks, after calculating the loss, you call backward() to compute gradients, then access .grad to see how each parameter should change. This guides the optimizer to improve the model.
It is also useful in custom gradient calculations or when debugging your model’s learning process.
Key Points
.gradstores the gradient of a tensor after backpropagation.- Gradients are used to update model parameters during training.
- Only tensors with
requires_grad=Truewill have.gradpopulated. - You must call
backward()on a scalar output to compute gradients. .gradis None before backpropagation.
Key Takeaways
.grad holds the gradient of a tensor after backpropagation.requires_grad=True have gradients computed.backward() on a scalar to compute gradients..grad is None until gradients are calculated.