Recall & Review
beginner
What is weight decay (L2 regularization) in machine learning?
Weight decay, also called L2 regularization, is a technique that adds a penalty to large weights in a model to keep them small. This helps the model avoid overfitting by making it simpler and more general.
Click to reveal answer
intermediate
How does weight decay affect the loss function during training?
Weight decay adds a term to the loss function that is proportional to the sum of the squares of the weights. This extra term encourages the model to keep weights small while still fitting the data.
Click to reveal answer
beginner
Show a simple PyTorch example of applying weight decay in an optimizer.
In PyTorch, you can add weight decay by setting the 'weight_decay' parameter in the optimizer. For example: optimizer = torch.optim.SGD(model.parameters(), lr=0.01, weight_decay=0.001)
Click to reveal answer
intermediate
Why is weight decay preferred over manually adding L2 penalty to the loss in PyTorch?
Using the 'weight_decay' parameter in PyTorch optimizers is more efficient and numerically stable because it applies the penalty directly during the weight update step, avoiding extra computation in the loss function.
Click to reveal answer
beginner
What happens if the weight decay value is set too high?
If weight decay is too high, the model weights become very small, which can cause underfitting. The model may not learn enough from the data and perform poorly on both training and test sets.
Click to reveal answer
What does weight decay do to model weights during training?
✗ Incorrect
Weight decay adds a penalty to large weights, encouraging them to be smaller.
In PyTorch, how do you apply weight decay when creating an optimizer?
✗ Incorrect
PyTorch optimizers have a 'weight_decay' parameter to apply L2 regularization automatically.
What is the main goal of using weight decay in training?
✗ Incorrect
Weight decay helps prevent overfitting by penalizing large weights.
What could happen if weight decay is set too high?
✗ Incorrect
Too much weight decay shrinks weights too much, causing underfitting.
Which of these is NOT a benefit of using weight decay?
✗ Incorrect
Weight decay helps generalization but does not guarantee perfect accuracy.
Explain in your own words what weight decay (L2 regularization) is and why it is useful in training machine learning models.
Think about how adding a small cost to big weights helps the model not memorize training data.
You got /4 concepts.
Describe how to apply weight decay in PyTorch and why it is better to use the optimizer's weight_decay parameter instead of manually adding L2 loss.
Remember the PyTorch optimizer options and how they handle regularization internally.
You got /4 concepts.