Practice - 5 Tasks

Answer the questions below

1fill in blank

easy

Complete the code to add L2 regularization (weight decay) to the optimizer.

PyTorch

optimizer = torch.optim.SGD(model.parameters(), lr=0.01, weight_decay=[1])

Drag options to blanks, or click blank then click option'

A0.001

D-0.01

Attempts:

3 left

💡 Hint

Common Mistakes

Setting weight_decay to zero disables L2 regularization.

Using a negative value causes an error.

✗ Incorrect

Weight decay is set by the weight_decay parameter in the optimizer. A small positive value like 0.001 applies L2 regularization.

2fill in blank

medium

Complete the code to create an SGD optimizer with learning rate 0.1 and weight decay 0.01.

PyTorch

optimizer = torch.optim.SGD(model.parameters(), lr=[1], weight_decay=[2])

Drag options to blanks, or click blank then click option'

A0.01

C0.001

D0.1

Attempts:

3 left

💡 Hint

Common Mistakes

Confusing learning rate with weight decay.

Using too large or too small learning rate.

✗ Incorrect

The learning rate is set with lr to 0.1 (D) and weight decay with weight_decay to 0.01 (A).

3fill in blank

hard

Fix the error in the optimizer creation by filling the correct weight decay value.

PyTorch

optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=[1])

Drag options to blanks, or click blank then click option'

A-0.01

B0.0

C0.01

D"0.01"

Attempts:

3 left

💡 Hint

Common Mistakes

Using negative values for weight decay.

Passing weight decay as a string instead of a float.

✗ Incorrect

Weight decay must be a positive float number. Negative or string values cause errors.

4fill in blank

hard

Fill both blanks to create an Adam optimizer with learning rate 0.0005 and weight decay 0.0001.

PyTorch

optimizer = torch.optim.Adam(model.parameters(), lr=[1], weight_decay=[2])

Drag options to blanks, or click blank then click option'

A0.0005

B0.001

C0.0001

D0.01

Attempts:

3 left

💡 Hint

Common Mistakes

Swapping learning rate and weight decay values.

Using too large weight decay.

✗ Incorrect

The learning rate is 0.0005 and weight decay is 0.0001 as specified.

5fill in blank

hard

Fill all three blanks to create an SGD optimizer with momentum 0.9, learning rate 0.01, and weight decay 0.0005.

PyTorch

optimizer = torch.optim.SGD(model.parameters(), momentum=[1], lr=[2], weight_decay=[3])

Drag options to blanks, or click blank then click option'

A0.9

B0.01

C0.0005

D0.1

Attempts:

3 left

💡 Hint

Common Mistakes

Mixing up momentum and learning rate values.

Using weight decay values that are too large.

✗ Incorrect

Momentum is 0.9, learning rate is 0.01, and weight decay is 0.0005 as required.

Practice

(1/5)

1. What is the main purpose of weight decay (L2 regularization) in training a PyTorch model?

easy

A. To reduce overfitting by penalizing large weights

B. To increase the learning rate automatically

C. To add more layers to the model

D. To speed up the training process

Solution

Step 1: Understand weight decay concept
Weight decay adds a penalty to large weights during training to prevent the model from fitting noise in the data.
Step 2: Connect to overfitting reduction
By keeping weights small, the model generalizes better and avoids overfitting.
Final Answer:
To reduce overfitting by penalizing large weights -> Option A
Quick Check:
Weight decay = reduces overfitting [OK]

Hint: Weight decay shrinks weights to avoid overfitting [OK]

Common Mistakes:

Confusing weight decay with learning rate changes
Thinking weight decay adds layers
Assuming weight decay speeds training

2. Which of the following is the correct way to apply weight decay in a PyTorch optimizer?

easy

A. optimizer = torch.optim.SGD(model.parameters(), lr=0.01, wd=0.001)

B. optimizer = torch.optim.SGD(model.parameters(), lr=0.01, decay_weight=0.001)

C. optimizer = torch.optim.SGD(model.parameters(), lr=0.01, weightDecay=0.001)

D. optimizer = torch.optim.SGD(model.parameters(), lr=0.01, weight_decay=0.001)

Solution

Step 1: Recall PyTorch optimizer syntax
PyTorch optimizers accept a parameter named weight_decay to apply L2 regularization.
Step 2: Identify correct parameter name
Only optimizer = torch.optim.SGD(model.parameters(), lr=0.01, weight_decay=0.001) uses the exact parameter weight_decay correctly.
Final Answer:
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, weight_decay=0.001) -> Option D
Quick Check:
Correct parameter name is weight_decay [OK]

Hint: Use exact parameter name 'weight_decay' in optimizer [OK]

Common Mistakes:

Using wrong parameter names like decay_weight or wd
Capitalizing parameter names incorrectly
Confusing weight decay with learning rate

3. Consider this PyTorch code snippet:

import torch
model = torch.nn.Linear(2, 1)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1, weight_decay=0.01)
initial_weight = model.weight.data.clone()
optimizer.zero_grad()
output = model(torch.tensor([[1.0, 2.0]]))
loss = output.sum()
loss.backward()
optimizer.step()
updated_weight = model.weight.data
print((initial_weight - updated_weight).abs().sum().item())

What does the printed value represent?

medium

A. The total change in weights after one optimization step including weight decay

B. The learning rate value

C. The loss value before backward pass

D. The sum of model outputs

Solution

Step 1: Understand code flow
The code runs one optimizer step with weight decay, then measures how much weights changed.
Step 2: Interpret printed value
The printed value is the sum of absolute differences between initial and updated weights, showing total weight change including weight decay effect.
Final Answer:
The total change in weights after one optimization step including weight decay -> Option A
Quick Check:
Weight change sum = printed value [OK]

Hint: Weight decay affects weight updates, so weight change includes it [OK]

Common Mistakes:

Thinking printed value is loss or learning rate
Ignoring weight decay effect on weights
Confusing output sum with weight change

4. You wrote this PyTorch optimizer code:

optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=0.1)

But your model is overfitting badly. What is a likely mistake?

medium

A. Weight decay value is too high, causing poor training

B. Weight decay should be set to zero to reduce overfitting

C. Weight decay is applied to biases by default, so overfitting remains

D. Learning rate is too low to affect weight decay

Solution

Step 1: Recall weight decay behavior in PyTorch
By default, weight decay is applied to all parameters, including biases and batch norm weights, unless explicitly excluded.
Step 2: Understand overfitting cause
If weight decay is applied to all parameters including biases, it may not reduce overfitting effectively because biases are not regularized properly.
Final Answer:
Weight decay is applied to biases by default, so overfitting remains -> Option C
Quick Check:
Biases often excluded from weight decay for better regularization [OK]

Hint: Check if weight decay excludes biases to reduce overfitting [OK]

Common Mistakes:

Assuming weight decay does not apply to biases
Setting weight decay to zero to fix overfitting
Blaming learning rate for weight decay issues

5. You want to apply weight decay only to the weights of a PyTorch model's linear layers but not to biases. Which code snippet correctly sets this up?

hard

A. optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=0.01)

B. params = [ {'params': [p for n, p in model.named_parameters() if 'weight' in n], 'weight_decay': 0.01}, {'params': [p for n, p in model.named_parameters() if 'bias' in n], 'weight_decay': 0.0} ] optimizer = torch.optim.Adam(params, lr=0.001)

C. optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=0.0)

D. params = [ {'params': model.parameters(), 'weight_decay': 0.01} ] optimizer = torch.optim.Adam(params, lr=0.001)

Solution

Step 1: Understand selective weight decay
To apply weight decay only to weights, separate parameters into groups with and without weight decay.
Step 2: Check code correctness
params = [ {'params': [p for n, p in model.named_parameters() if 'weight' in n], 'weight_decay': 0.01}, {'params': [p for n, p in model.named_parameters() if 'bias' in n], 'weight_decay': 0.0} ] optimizer = torch.optim.Adam(params, lr=0.001) creates two groups: weights with weight_decay=0.01 and biases with weight_decay=0.0, correctly excluding biases.
Final Answer:
Code snippet that separates weights and biases with different weight_decay values -> Option B
Quick Check:
Separate params for weight decay control [OK]

Hint: Group parameters by name to apply weight decay selectively [OK]

Common Mistakes:

Applying weight decay to all parameters blindly
Not separating biases from weights
Using wrong parameter names in filtering

Weight decay (L2 regularization) in PyTorch - Interactive Code Practice

Start learning this pattern below

Practice

Solution

Step 1: Understand weight decay concept

Step 2: Connect to overfitting reduction

Final Answer:

Quick Check:

Solution

Step 1: Recall PyTorch optimizer syntax

Step 2: Identify correct parameter name

Final Answer:

Quick Check:

Solution

Step 1: Understand code flow

Step 2: Interpret printed value

Final Answer:

Quick Check:

Solution

Step 1: Recall weight decay behavior in PyTorch

Step 2: Understand overfitting cause

Final Answer:

Quick Check:

Solution

Step 1: Understand selective weight decay

Step 2: Check code correctness

Final Answer:

Quick Check: