Complete the code to zero the gradients before starting the backward pass.
optimizer.[1]()Before computing gradients, we clear old gradients using zero_grad().
Complete the code to perform a backward pass on the loss.
loss.[1]()Calling backward() on the loss computes gradients for all parameters.
Fix the error in the code to accumulate gradients over multiple batches before optimizer step.
for i, data in enumerate(dataloader): inputs, labels = data outputs = model(inputs) loss = criterion(outputs, labels) / [1] loss.backward() if (i + 1) % accumulation_steps == 0: optimizer.step() optimizer.zero_grad()
Dividing loss by accumulation_steps averages gradients over that many batches.
Fill both blanks to correctly implement gradient accumulation and zeroing in the training loop.
optimizer.[1]() for i, data in enumerate(dataloader): inputs, labels = data outputs = model(inputs) loss = criterion(outputs, labels) / accumulation_steps loss.backward() if (i + 1) % accumulation_steps == 0: optimizer.[2]() optimizer.zero_grad()
We zero gradients before the loop and call step() followed by zero_grad() after accumulating gradients.
Fill all three blanks to implement gradient accumulation with correct loss scaling, optimizer step, and zeroing.
optimizer.[1]() for i, batch in enumerate(dataloader): inputs, targets = batch outputs = model(inputs) loss = criterion(outputs, targets) / [2] loss.[3]() if (i + 1) % accumulation_steps == 0: optimizer.step() optimizer.zero_grad()
Zero gradients before loop, scale loss by accumulation_steps, and call backward() on loss.