Challenge - 5 Problems

🎖️

Gradient Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

What is the output of this PyTorch code snippet?

Consider the following PyTorch code that performs a backward pass and then prints the gradient of a tensor. What will be printed?

PyTorch

import torch
x = torch.tensor([2.0], requires_grad=True)
y = x * 3
z = y ** 2
z.backward()
print(x.grad)
x.grad.zero_()
print(x.grad)

Atensor([36.])\ntensor([36.])

Btensor([36.])\ntensor([0.])

Ctensor([6.])\ntensor([0.])

Dtensor([6.])\ntensor([6.])

Attempts:

2 left

❓ Model Choice

intermediate

1:30remaining

Which PyTorch method correctly resets gradients before a new training step?

In a typical training loop, which method should be called to clear gradients before computing new ones?

Atorch.no_grad()

Bmodel.zero_grad()

Coptimizer.zero_grad()

Dloss.backward()

Attempts:

2 left

❓ Hyperparameter

advanced

1:30remaining

What happens if you forget to zero gradients in a PyTorch training loop?

Consider a training loop where optimizer.zero_grad() is never called. What is the effect on the gradients during training?

AGradients accumulate, causing updates to be larger than intended.

BGradients are reset automatically each step, so no effect.

CTraining will raise a runtime error due to missing zeroing.

DGradients become zero, so model weights do not update.

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Why does this PyTorch code raise an error when zeroing gradients?

Examine the code below. Why does it raise an error at x.grad.zero_()?

PyTorch

import torch
x = torch.tensor([1.0, 2.0], requires_grad=True)
y = x.sum()
y.backward()
x.grad = None
x.grad.zero_()

Ax.grad is None, so calling zero_() raises AttributeError.

Bx.grad is not a tensor, so zero_() is undefined.

Cx.grad has already been zeroed, so zero_() fails.

Dx.grad is a scalar, zero_() requires a tensor.

Attempts:

2 left

🧠 Conceptual

expert

2:30remaining

Why is zeroing gradients important in mini-batch training?

In mini-batch gradient descent, why must gradients be zeroed before processing each batch?

ATo save memory by deleting old gradients permanently.

BTo initialize model weights to zero before each batch.

CBecause gradients are automatically reset after optimizer.step().

DTo prevent gradient accumulation from previous batches, ensuring correct updates.

Attempts:

2 left