beginner

What is the purpose of the backward pass in PyTorch?

The backward pass computes gradients of the loss with respect to model parameters. It helps the model learn by showing how to adjust weights to reduce errors.

Click to reveal answer

beginner

What does the method loss.backward() do in PyTorch?

It calculates the gradients of the loss tensor with respect to all tensors that have requires_grad=True. These gradients are stored in the .grad attribute of each tensor.

Click to reveal answer

intermediate

Why do we need to call optimizer.zero_grad() before loss.backward()?

Because PyTorch accumulates gradients by default, calling optimizer.zero_grad() clears old gradients. This prevents mixing gradients from multiple backward passes.

Click to reveal answer

beginner

What happens if you forget to call loss.backward() during training?

No gradients will be computed, so the optimizer cannot update the model weights. The model will not learn or improve.

Click to reveal answer

intermediate

How does PyTorch know which operations to track for gradient computation?

PyTorch builds a computation graph dynamically during the forward pass. It tracks operations on tensors with requires_grad=True to compute gradients during the backward pass.

Click to reveal answer

What does loss.backward() compute?

AThe optimizer step

BGradients of loss with respect to model parameters

CThe updated model weights

DThe loss value itself

Before calling loss.backward(), why do we call optimizer.zero_grad()?

ATo clear old gradients so they don't accumulate

BTo compute the loss value

CTo update the model weights

DTo save the model

If a tensor has requires_grad=False, what happens during loss.backward()?

AGradient is computed normally

BThe tensor is updated automatically

CAn error occurs

DNo gradient is computed for that tensor

What is stored in the .grad attribute after loss.backward()?

AThe updated tensor value

BThe loss value

CThe gradient of the loss with respect to the tensor

DThe optimizer state

What does PyTorch use to track operations for gradient calculation?

AA dynamic computation graph

BA static computation graph

CA database

DA configuration file

Explain in your own words what happens during the backward pass when you call loss.backward() in PyTorch.

Why is it important to call optimizer.zero_grad() before loss.backward() in a training loop?