Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is overfitting in machine learning?
Overfitting happens when a model learns the training data too well, including noise and details that don't apply to new data. This makes the model perform poorly on new, unseen data.
Click to reveal answer
beginner
What does regularization do in a machine learning model?
Regularization adds a penalty to the model's complexity, encouraging it to keep weights smaller or simpler. This helps the model generalize better to new data by avoiding fitting noise.
Click to reveal answer
intermediate
How does L2 regularization (weight decay) work in PyTorch?
L2 regularization adds the sum of squared weights to the loss function. In PyTorch, this is often done by setting the 'weight_decay' parameter in the optimizer, which shrinks weights during training.
Click to reveal answer
beginner
Why does regularization reduce overfitting?
Regularization limits how complex the model can get by penalizing large weights. This stops the model from memorizing training data noise and helps it learn patterns that work well on new data.
Click to reveal answer
intermediate
What is the difference between L1 and L2 regularization?
L1 regularization adds the absolute values of weights to the loss, encouraging sparsity (some weights become zero). L2 adds squared weights, encouraging smaller weights but not necessarily zero.
Click to reveal answer
What problem does regularization mainly help to solve?
AUnderfitting
BSlow training
CData imbalance
DOverfitting
✗ Incorrect
Regularization helps prevent overfitting by controlling model complexity.
In PyTorch, how do you apply L2 regularization?
ASet weight_decay in the optimizer
BAdd dropout layers
CUse batch normalization
DIncrease learning rate
✗ Incorrect
L2 regularization is applied by setting the weight_decay parameter in the optimizer.
Which regularization method encourages some weights to become exactly zero?
AEarly stopping
BL1 regularization
CDropout
DL2 regularization
✗ Incorrect
L1 regularization encourages sparsity by pushing some weights to zero.
Why does a model with very large weights tend to overfit?
AIt memorizes noise in training data
BIt trains faster
CIt ignores training data
DIt has fewer parameters
✗ Incorrect
Large weights can cause the model to memorize noise, leading to overfitting.
What is a simple way to explain regularization to a friend?
AIt removes data from training
BIt makes the model bigger
CIt keeps the model simple so it works well on new data
DIt increases training time
✗ Incorrect
Regularization keeps the model simple to help it generalize better.
Explain in your own words why regularization helps control overfitting in machine learning models.
Think about how adding a penalty changes the model's learning.
You got /4 concepts.
Describe how you would add L2 regularization to a PyTorch model training process.
Focus on the optimizer settings in PyTorch.
You got /4 concepts.
Practice
(1/5)
1. Why does regularization help prevent overfitting in a PyTorch model?
easy
A. It keeps the model weights small by adding a penalty to the loss.
B. It increases the size of the training dataset automatically.
C. It removes layers from the neural network during training.
D. It speeds up the training process by skipping some data points.
Solution
Step 1: Understand what overfitting means
Overfitting happens when a model learns the training data too well, including noise, causing poor performance on new data.
Step 2: Explain how regularization affects model weights
Regularization adds a penalty to large weights, encouraging smaller weights that generalize better to new data.
Final Answer:
It keeps the model weights small by adding a penalty to the loss. -> Option A
Quick Check:
Regularization = penalty on weights = less overfitting [OK]
Hint: Regularization adds penalty to weights to reduce overfitting [OK]
Common Mistakes:
Thinking regularization increases data size
Believing regularization removes layers
Assuming regularization speeds training
2. Which PyTorch code snippet correctly applies L2 regularization (weight decay) during optimizer setup?
easy
A. optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.1)
B. optimizer = torch.optim.SGD(model.parameters(), lr=0.01, dropout=0.1)
C. optimizer = torch.optim.SGD(model.parameters(), lr=0.01, weight_decay=0.1)
D. optimizer = torch.optim.SGD(model.parameters(), lr=0.01, decay=0.1)
Solution
Step 1: Identify correct parameter for L2 regularization in PyTorch
PyTorch uses weight_decay in optimizers to apply L2 regularization.
Step 2: Check the code options for correct usage
Only optimizer = torch.optim.SGD(model.parameters(), lr=0.01, weight_decay=0.1) uses weight_decay=0.1, which is the correct way to add L2 regularization.
Final Answer:
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, weight_decay=0.1) -> Option C
Quick Check:
weight_decay = L2 regularization in PyTorch [OK]
Hint: Use weight_decay param for L2 regularization in PyTorch optimizers [OK]
Common Mistakes:
Using dropout parameter in optimizer
Confusing momentum with regularization
Using decay instead of weight_decay
3. Consider this PyTorch training loop snippet with L2 regularization applied:
optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=0.01)
for data, target in dataloader:
optimizer.zero_grad()
output = model(data)
loss = loss_fn(output, target)
loss.backward()
optimizer.step()
What effect does the weight_decay=0.01 have during training?
medium
A. It adds a penalty to large weights, helping reduce overfitting.
B. It increases the learning rate by 0.01 each step.
C. It drops 1% of neurons randomly during training.
D. It stops training early when loss is below 0.01.
Solution
Step 1: Understand weight_decay in optimizer
The weight_decay parameter adds L2 regularization, penalizing large weights during training.
Step 2: Identify the effect on training
This penalty helps the model avoid overfitting by keeping weights smaller and more generalizable.
Final Answer:
It adds a penalty to large weights, helping reduce overfitting. -> Option A
Quick Check:
weight_decay = L2 penalty = less overfitting [OK]
Hint: weight_decay adds penalty to weights, not learning rate or dropout [OK]
Common Mistakes:
Confusing weight_decay with learning rate changes
Thinking weight_decay is dropout
Assuming weight_decay controls early stopping
4. You have this PyTorch code snippet intended to apply L2 regularization:
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
for data, target in dataloader:
optimizer.zero_grad()
output = model(data)
loss = loss_fn(output, target) + 0.01 * torch.sum(model.parameters())
loss.backward()
optimizer.step()
What is wrong with this code regarding regularization?
medium
A. It uses SGD optimizer which does not support regularization.
B. It forgets to call optimizer.zero_grad() before backward.
C. It applies regularization after optimizer.step(), so no effect.
D. It incorrectly sums parameters instead of their squares for L2 penalty.
Solution
Step 1: Check how L2 regularization is computed
L2 regularization requires summing the squares of parameters, not just their values.
Step 2: Analyze the code's regularization term
The code sums parameters directly with torch.sum(model.parameters()), which is incorrect for L2 penalty.
Final Answer:
It incorrectly sums parameters instead of their squares for L2 penalty. -> Option D
Quick Check:
L2 penalty = sum of squares, not sum of values [OK]
Hint: L2 regularization sums squares of weights, not weights themselves [OK]
Common Mistakes:
Summing parameters instead of squared parameters
Thinking SGD can't use regularization
Misplacing optimizer.zero_grad() call
5. You train two PyTorch models on the same dataset: Model A uses no regularization, Model B uses L2 regularization with weight_decay=0.05. After training, Model A has training accuracy 98% but test accuracy 70%, while Model B has training accuracy 90% and test accuracy 85%. What explains this difference?
hard
A. Model A's higher training accuracy means it will always perform better on test data.
B. Model B's regularization reduced overfitting by keeping weights smaller, improving test accuracy.
C. Model B used a larger learning rate, causing better generalization.
D. Model A trained longer, so it has better test accuracy.
Solution
Step 1: Compare training and test accuracies
Model A fits training data very well but performs poorly on test data, indicating overfitting.
Step 2: Understand effect of L2 regularization on Model B
Model B has lower training accuracy but better test accuracy because regularization keeps weights smaller, improving generalization.
Final Answer:
Model B's regularization reduced overfitting by keeping weights smaller, improving test accuracy. -> Option B
Quick Check:
Regularization = smaller weights = better test accuracy [OK]
Hint: Better test accuracy with regularization means less overfitting [OK]
Common Mistakes:
Assuming higher training accuracy means better test accuracy
Confusing learning rate with regularization effect
Ignoring the role of weight size in generalization