Complete the code to zero the gradients before starting the backward pass.
optimizer.[1]()Before computing gradients, we clear old gradients by calling optimizer.zero_grad().
Complete the code to perform a backward pass on the loss.
loss.[1]()To compute gradients, call loss.backward() which backpropagates the error.
Fix the error in the code to accumulate gradients over multiple batches before updating weights.
if (batch_idx + 1) % [1] == 0: optimizer.step() optimizer.zero_grad()
Gradient accumulation updates weights every accumulation_steps batches.
Fill both blanks to correctly scale the loss for gradient accumulation.
loss = loss / [1] loss.[2]()
We divide loss by accumulation_steps to average gradients, then call backward() to accumulate.
Fill all three blanks to implement gradient accumulation correctly in a training loop.
for batch_idx, (inputs, targets) in enumerate(dataloader): outputs = model(inputs) loss = criterion(outputs, targets) / [1] loss.[2]() if (batch_idx + 1) % [3] == 0: optimizer.step() optimizer.zero_grad()
Divide loss by accumulation_steps, call backward() to accumulate gradients, and update weights every accumulation_steps batches.