0
0
PyTorchml~10 mins

Gradient accumulation in PyTorch - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to zero the gradients before starting the backward pass.

PyTorch
optimizer.[1]()
Drag options to blanks, or click blank then click option'
Azero_grad
Bstep
Cbackward
Deval
Attempts:
3 left
💡 Hint
Common Mistakes
Using optimizer.step() before zeroing gradients
Calling backward() on optimizer instead of loss
Forgetting to clear gradients causing accumulation unintentionally
2fill in blank
medium

Complete the code to perform a backward pass on the loss.

PyTorch
loss.[1]()
Drag options to blanks, or click blank then click option'
Adetach
Bbackward
Czero_grad
Dstep
Attempts:
3 left
💡 Hint
Common Mistakes
Calling optimizer.step() instead of loss.backward()
Forgetting to call backward() causing no gradient computation
3fill in blank
hard

Fix the error in the code to accumulate gradients over multiple batches before updating weights.

PyTorch
if (batch_idx + 1) % [1] == 0:
    optimizer.step()
    optimizer.zero_grad()
Drag options to blanks, or click blank then click option'
Anum_epochs
Bbatch_size
Clearning_rate
Daccumulation_steps
Attempts:
3 left
💡 Hint
Common Mistakes
Using batch_size instead of accumulation_steps
Updating optimizer every batch without accumulation
4fill in blank
hard

Fill both blanks to correctly scale the loss for gradient accumulation.

PyTorch
loss = loss / [1]
loss.[2]()
Drag options to blanks, or click blank then click option'
Aaccumulation_steps
Bbatch_size
Cbackward
Dstep
Attempts:
3 left
💡 Hint
Common Mistakes
Not scaling loss causing gradients to be too large
Calling optimizer.step() instead of loss.backward() here
5fill in blank
hard

Fill all three blanks to implement gradient accumulation correctly in a training loop.

PyTorch
for batch_idx, (inputs, targets) in enumerate(dataloader):
    outputs = model(inputs)
    loss = criterion(outputs, targets) / [1]
    loss.[2]()
    if (batch_idx + 1) % [3] == 0:
        optimizer.step()
        optimizer.zero_grad()
Drag options to blanks, or click blank then click option'
Aaccumulation_steps
Bbackward
Dstep
Attempts:
3 left
💡 Hint
Common Mistakes
Using different variables for scaling and update frequency
Forgetting to zero gradients after optimizer step