Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is weight decay (L2 regularization) in machine learning?
Weight decay, also called L2 regularization, is a technique that adds a penalty to large weights in a model to keep them small. This helps the model avoid overfitting by making it simpler and more general.
Click to reveal answer
intermediate
How does weight decay affect the loss function during training?
Weight decay adds a term to the loss function that is proportional to the sum of the squares of the weights. This extra term encourages the model to keep weights small while still fitting the data.
Click to reveal answer
beginner
Show a simple PyTorch example of applying weight decay in an optimizer.
In PyTorch, you can add weight decay by setting the 'weight_decay' parameter in the optimizer. For example: optimizer = torch.optim.SGD(model.parameters(), lr=0.01, weight_decay=0.001)
Click to reveal answer
intermediate
Why is weight decay preferred over manually adding L2 penalty to the loss in PyTorch?
Using the 'weight_decay' parameter in PyTorch optimizers is more efficient and numerically stable because it applies the penalty directly during the weight update step, avoiding extra computation in the loss function.
Click to reveal answer
beginner
What happens if the weight decay value is set too high?
If weight decay is too high, the model weights become very small, which can cause underfitting. The model may not learn enough from the data and perform poorly on both training and test sets.
Click to reveal answer
What does weight decay do to model weights during training?
ARemoves weights completely
BEncourages weights to be smaller
CMakes weights larger
DKeeps weights unchanged
✗ Incorrect
Weight decay adds a penalty to large weights, encouraging them to be smaller.
In PyTorch, how do you apply weight decay when creating an optimizer?
ANormalize weights after each epoch
BAdd L2 penalty manually to the loss
CUse a special weight_decay layer
DSet the 'weight_decay' parameter in the optimizer
✗ Incorrect
PyTorch optimizers have a 'weight_decay' parameter to apply L2 regularization automatically.
What is the main goal of using weight decay in training?
AIncrease model complexity
BSpeed up training time
CPrevent overfitting by keeping weights small
DMake the model memorize training data
✗ Incorrect
Weight decay helps prevent overfitting by penalizing large weights.
What could happen if weight decay is set too high?
AModel underfits and performs poorly
BModel overfits the training data
CTraining speed increases drastically
DWeights become very large
✗ Incorrect
Too much weight decay shrinks weights too much, causing underfitting.
Which of these is NOT a benefit of using weight decay?
AGuarantees perfect accuracy
BReduces overfitting
CKeeps model weights small
DImproves model generalization
✗ Incorrect
Weight decay helps generalization but does not guarantee perfect accuracy.
Explain in your own words what weight decay (L2 regularization) is and why it is useful in training machine learning models.
Think about how adding a small cost to big weights helps the model not memorize training data.
You got /4 concepts.
Describe how to apply weight decay in PyTorch and why it is better to use the optimizer's weight_decay parameter instead of manually adding L2 loss.
Remember the PyTorch optimizer options and how they handle regularization internally.
You got /4 concepts.
Practice
(1/5)
1. What is the main purpose of weight decay (L2 regularization) in training a PyTorch model?
easy
A. To reduce overfitting by penalizing large weights
B. To increase the learning rate automatically
C. To add more layers to the model
D. To speed up the training process
Solution
Step 1: Understand weight decay concept
Weight decay adds a penalty to large weights during training to prevent the model from fitting noise in the data.
Step 2: Connect to overfitting reduction
By keeping weights small, the model generalizes better and avoids overfitting.
Final Answer:
To reduce overfitting by penalizing large weights -> Option A
Quick Check:
Weight decay = reduces overfitting [OK]
Hint: Weight decay shrinks weights to avoid overfitting [OK]
Common Mistakes:
Confusing weight decay with learning rate changes
Thinking weight decay adds layers
Assuming weight decay speeds training
2. Which of the following is the correct way to apply weight decay in a PyTorch optimizer?
easy
A. optimizer = torch.optim.SGD(model.parameters(), lr=0.01, wd=0.001)
B. optimizer = torch.optim.SGD(model.parameters(), lr=0.01, decay_weight=0.001)
C. optimizer = torch.optim.SGD(model.parameters(), lr=0.01, weightDecay=0.001)
D. optimizer = torch.optim.SGD(model.parameters(), lr=0.01, weight_decay=0.001)
Solution
Step 1: Recall PyTorch optimizer syntax
PyTorch optimizers accept a parameter named weight_decay to apply L2 regularization.
Step 2: Identify correct parameter name
Only optimizer = torch.optim.SGD(model.parameters(), lr=0.01, weight_decay=0.001) uses the exact parameter weight_decay correctly.
Final Answer:
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, weight_decay=0.001) -> Option D
Quick Check:
Correct parameter name is weight_decay [OK]
Hint: Use exact parameter name 'weight_decay' in optimizer [OK]
Common Mistakes:
Using wrong parameter names like decay_weight or wd
But your model is overfitting badly. What is a likely mistake?
medium
A. Weight decay value is too high, causing poor training
B. Weight decay should be set to zero to reduce overfitting
C. Weight decay is applied to biases by default, so overfitting remains
D. Learning rate is too low to affect weight decay
Solution
Step 1: Recall weight decay behavior in PyTorch
By default, weight decay is applied to all parameters, including biases and batch norm weights, unless explicitly excluded.
Step 2: Understand overfitting cause
If weight decay is applied to all parameters including biases, it may not reduce overfitting effectively because biases are not regularized properly.
Final Answer:
Weight decay is applied to biases by default, so overfitting remains -> Option C
Quick Check:
Biases often excluded from weight decay for better regularization [OK]
Hint: Check if weight decay excludes biases to reduce overfitting [OK]
Common Mistakes:
Assuming weight decay does not apply to biases
Setting weight decay to zero to fix overfitting
Blaming learning rate for weight decay issues
5. You want to apply weight decay only to the weights of a PyTorch model's linear layers but not to biases. Which code snippet correctly sets this up?
hard
A. optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=0.01)
B. params = [
{'params': [p for n, p in model.named_parameters() if 'weight' in n], 'weight_decay': 0.01},
{'params': [p for n, p in model.named_parameters() if 'bias' in n], 'weight_decay': 0.0}
]
optimizer = torch.optim.Adam(params, lr=0.001)
C. optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=0.0)
To apply weight decay only to weights, separate parameters into groups with and without weight decay.
Step 2: Check code correctness
params = [
{'params': [p for n, p in model.named_parameters() if 'weight' in n], 'weight_decay': 0.01},
{'params': [p for n, p in model.named_parameters() if 'bias' in n], 'weight_decay': 0.0}
]
optimizer = torch.optim.Adam(params, lr=0.001) creates two groups: weights with weight_decay=0.01 and biases with weight_decay=0.0, correctly excluding biases.
Final Answer:
Code snippet that separates weights and biases with different weight_decay values -> Option B
Quick Check:
Separate params for weight decay control [OK]
Hint: Group parameters by name to apply weight decay selectively [OK]