Imagine you want to teach a robot to recognize cats in photos. The robot uses a neural network that learns by adjusting numbers called weights. Why do we need automatic differentiation for this learning process?
Think about how the robot knows how to improve itself after seeing mistakes.
Automatic differentiation helps calculate gradients, which tell the model how to adjust weights to reduce errors. This is essential for training neural networks efficiently.
What is the output of the following PyTorch code snippet?
import torch x = torch.tensor(2.0, requires_grad=True) y = x ** 3 z = y + 2 * x z.backward() print(x.grad.item())
Recall the derivative of z = x^3 + 2x with respect to x.
The derivative dz/dx = 3x^2 + 2. At x=2, dz/dx = 3*(2^2) + 2 = 12 + 2 = 14.
You want to demonstrate automatic differentiation with a simple model that has a clear gradient. Which model is best?
Think about which model has parameters that can be differentiated easily.
Linear models have parameters (weights and bias) that are differentiable, making them ideal to show automatic differentiation. Other models do not use gradient-based training.
During training with automatic differentiation, you adjust the learning rate. What happens if the learning rate is too high?
Consider what happens if you take very large steps when trying to reach a target.
A high learning rate causes large weight updates, which can overshoot the minimum and make training unstable or diverge.
Consider this PyTorch code:
import torch x = torch.tensor(3.0, requires_grad=True) y = x ** 2 z = y.detach() + 1 z.backward()
What error will occur and why?
Think about what detach() does to the tensor in terms of gradient tracking.
detach() creates a tensor that does not track gradients, breaking the graph. Calling backward() on z after using detach() on part of the graph causes a RuntimeError.