Challenge - 5 Problems

🎖️

Backward Pass Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of backward pass gradient accumulation

Consider the following PyTorch code snippet. What will be the value of x.grad after running all lines?

PyTorch

import torch
x = torch.tensor(2.0, requires_grad=True)
y = x * 3
loss = y ** 2
loss.backward()
loss = (x * 4) ** 2
loss.backward()
print(x.grad.item())

A100.0

BNone (raises error)

C16.0

D36.0

Attempts:

2 left

🧠 Conceptual

intermediate

1:30remaining

Effect of calling loss.backward() multiple times without zeroing gradients

In PyTorch, what happens if you call loss.backward() multiple times on different losses without resetting gradients?

AAn error is raised on the second backward call

BGradients accumulate, adding up from each backward call

CGradients are overwritten by the latest backward call

DGradients are reset to zero automatically before each backward

Attempts:

2 left

❓ Hyperparameter

advanced

1:30remaining

Choosing correct learning rate with backward pass

You notice your model's loss does not decrease after several backward passes and optimizer steps. Which learning rate adjustment is most likely to help?

ARemove the backward pass and only do forward passes

BIncrease the learning rate to a larger value

CKeep the learning rate the same and increase batch size

DDecrease the learning rate to a smaller value

Attempts:

2 left

🔧 Debug

advanced

1:30remaining

Identifying error in backward pass usage

What error will this PyTorch code raise when calling loss.backward()?

import torch
x = torch.tensor(1.0)
y = x * 2
loss = y ** 2
loss.backward()

ATypeError: 'float' object is not callable

BSyntaxError: invalid syntax

CRuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

DNo error, backward runs successfully

Attempts:

2 left

❓ Model Choice

expert

2:30remaining

Choosing model for efficient backward pass in large-scale training

You want to train a very deep neural network with millions of parameters efficiently. Which approach helps reduce memory usage during the backward pass?

AUse gradient checkpointing to save memory by recomputing activations

BUse a shallow model to avoid backward pass complexity

CDisable gradient computation entirely with torch.no_grad()

DIncrease batch size to reduce number of backward passes

Attempts:

2 left