0
0
PyTorchml~10 mins

Gradient accumulation and zeroing in PyTorch - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to zero the gradients before starting the backward pass.

PyTorch
optimizer.[1]()
Drag options to blanks, or click blank then click option'
Astep
Bbackward
Czero_grad
Deval
Attempts:
3 left
💡 Hint
Common Mistakes
Calling optimizer.step() before zeroing gradients
Forgetting to zero gradients causing accumulation
Using backward() on optimizer instead of loss
2fill in blank
medium

Complete the code to perform a backward pass on the loss.

PyTorch
loss.[1]()
Drag options to blanks, or click blank then click option'
Adetach
Bzero_grad
Cstep
Dbackward
Attempts:
3 left
💡 Hint
Common Mistakes
Calling zero_grad() on loss instead of optimizer
Calling step() on loss instead of optimizer
Forgetting to call backward() causing no gradient computation
3fill in blank
hard

Fix the error in the code to accumulate gradients over multiple batches before optimizer step.

PyTorch
for i, data in enumerate(dataloader):
    inputs, labels = data
    outputs = model(inputs)
    loss = criterion(outputs, labels) / [1]
    loss.backward()
    if (i + 1) % accumulation_steps == 0:
        optimizer.step()
        optimizer.zero_grad()
Drag options to blanks, or click blank then click option'
Aaccumulation_steps
Bnum_epochs
Clen(dataloader)
Dbatch_size
Attempts:
3 left
💡 Hint
Common Mistakes
Dividing loss by batch_size instead of accumulation_steps
Not dividing loss causing gradients to be too large
Calling optimizer.step() every batch instead of after accumulation
4fill in blank
hard

Fill both blanks to correctly implement gradient accumulation and zeroing in the training loop.

PyTorch
optimizer.[1]()
for i, data in enumerate(dataloader):
    inputs, labels = data
    outputs = model(inputs)
    loss = criterion(outputs, labels) / accumulation_steps
    loss.backward()
    if (i + 1) % accumulation_steps == 0:
        optimizer.[2]()
        optimizer.zero_grad()
Drag options to blanks, or click blank then click option'
Azero_grad
Bstep
Cbackward
Deval
Attempts:
3 left
💡 Hint
Common Mistakes
Calling step before zero_grad
Forgetting to zero gradients before loop
Calling zero_grad() every batch instead of after accumulation steps
5fill in blank
hard

Fill all three blanks to implement gradient accumulation with correct loss scaling, optimizer step, and zeroing.

PyTorch
optimizer.[1]()
for i, batch in enumerate(dataloader):
    inputs, targets = batch
    outputs = model(inputs)
    loss = criterion(outputs, targets) / [2]
    loss.[3]()
    if (i + 1) % accumulation_steps == 0:
        optimizer.step()
        optimizer.zero_grad()
Drag options to blanks, or click blank then click option'
Azero_grad
Baccumulation_steps
Cbackward
Dstep
Attempts:
3 left
💡 Hint
Common Mistakes
Not scaling loss causing large gradients
Calling backward before zero_grad
Forgetting to zero gradients before loop