beginner

What is gradient accumulation in PyTorch?

Gradient accumulation is a technique where gradients are summed over multiple mini-batches before updating model weights. This helps simulate a larger batch size without increasing memory usage.

Click to reveal answer

beginner

Why do we need to zero gradients in PyTorch during training?

We zero gradients to clear old gradient values from the previous backward pass. Without zeroing, gradients would keep accumulating unintentionally, leading to incorrect updates.

Click to reveal answer

beginner

How do you zero gradients in PyTorch?

You call optimizer.zero_grad() before the backward pass to reset all gradients to zero.

Click to reveal answer

intermediate

What happens if you forget to zero gradients in a training loop?

Gradients from all previous batches accumulate, causing the model to update weights incorrectly and potentially harming training performance.

Click to reveal answer

intermediate

How does gradient accumulation help when GPU memory is limited?

By accumulating gradients over several small batches, you can simulate a larger batch size without needing to load all data at once, saving memory.

Click to reveal answer

What PyTorch function is used to clear gradients before a backward pass?

Aoptimizer.step()

Bmodel.zero_grad()

Closs.backward()

Doptimizer.zero_grad()

Why accumulate gradients over multiple batches?

ATo increase effective batch size without extra memory

BTo speed up training by skipping backward passes

CTo avoid zeroing gradients

DTo reduce model size

What happens if you call optimizer.step() without zeroing gradients first?

AWeights update with accumulated gradients from previous steps

BWeights do not update

CTraining crashes

DGradients reset automatically

Which of these is NOT a reason to use gradient accumulation?

ALimited GPU memory

BSimulate larger batch size

CAvoid zeroing gradients

DImprove training stability

When should optimizer.zero_grad() be called in the training loop?

AAfter optimizer.step()

BBefore loss.backward()

CAfter loss.backward()

DAt the end of training

Explain how gradient accumulation works and why it is useful in training deep learning models.

Describe the importance of zeroing gradients in PyTorch and what could happen if you skip this step.