How Autograd Works in PyTorch: Simple Explanation and Example
PyTorch's
autograd automatically tracks operations on tensors with requires_grad=True and computes gradients during backpropagation using backward(). It builds a dynamic computation graph on the fly, enabling efficient gradient calculation for training models.Syntax
To use autograd, create tensors with requires_grad=True to track operations. Call backward() on a scalar output to compute gradients. Access gradients via the .grad attribute of tensors.
tensor = torch.tensor(data, requires_grad=True): creates a tensor that tracks operations.output.backward(): computes gradients of output with respect to inputs.tensor.grad: holds the gradient after backward call.
python
import torch # Create tensor with gradient tracking x = torch.tensor([2.0, 3.0], requires_grad=True) # Perform operations y = x * x + 3 * x + 1 # Compute gradient of sum of y y.sum().backward() # Access gradients print(x.grad)
Output
tensor([7., 9.])
Example
This example shows how autograd tracks operations and computes gradients automatically. We define a simple function, compute its output, and call backward() to get gradients.
python
import torch # Create input tensor with gradient tracking x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True) # Define a function of x y = 2 * x + 3 z = y.pow(2).sum() # scalar output # Compute gradients z.backward() # Print gradients of x print('x:', x) print('Gradient of z w.r.t x:', x.grad)
Output
x: tensor([1., 2., 3.], requires_grad=True)
Gradient of z w.r.t x: tensor([ 8., 16., 24.])
Common Pitfalls
Common mistakes when using autograd include:
- Not setting
requires_grad=Trueon input tensors, so gradients are not tracked. - Calling
backward()on non-scalar tensors without specifying gradient argument. - Reusing tensors without detaching or zeroing gradients, causing incorrect gradient accumulation.
- Modifying tensors in-place which can break the computation graph.
python
import torch # Wrong: requires_grad not set x = torch.tensor([1.0, 2.0, 3.0]) y = x * 2 try: y.backward() except RuntimeError as e: print('Error:', e) # Right: requires_grad=True x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True) y = x * 2 # y is not scalar, so specify gradient y.backward(torch.tensor([1.0, 1.0, 1.0])) print('Gradients:', x.grad)
Output
Error: grad can be implicitly created only for scalar outputs
Gradients: tensor([2., 2., 2.])
Quick Reference
Summary tips for using PyTorch autograd:
- Always set
requires_grad=Trueon tensors you want gradients for. - Call
backward()on scalar outputs to compute gradients. - Access gradients via
.gradattribute after backward. - Use
zero_grad()on optimizers to clear gradients before new backward calls. - Detach tensors to stop tracking when needed with
.detach().
Key Takeaways
PyTorch autograd tracks tensor operations dynamically to compute gradients automatically.
Set requires_grad=True on tensors to enable gradient tracking.
Call backward() on scalar outputs to compute gradients for all dependent tensors.
Access computed gradients via the .grad attribute of tensors.
Avoid common mistakes like missing requires_grad or calling backward on non-scalars without gradients.