Imagine you have a model that learns to predict house prices. You add L2 regularization (weight decay) during training. What is the main reason this helps reduce overfitting?
Think about how smaller weights affect the model's ability to fit complex patterns.
L2 regularization adds a penalty for large weights, encouraging the model to keep weights small. Smaller weights mean the model is less complex and less likely to fit noise in the training data, which reduces overfitting.
Consider this PyTorch training loop snippet. What will be the difference in training loss behavior when dropout is enabled versus disabled?
import torch import torch.nn as nn class SimpleNet(nn.Module): def __init__(self, dropout_rate=0.0): super().__init__() self.fc1 = nn.Linear(10, 5) self.dropout = nn.Dropout(dropout_rate) self.fc2 = nn.Linear(5, 1) def forward(self, x): x = torch.relu(self.fc1(x)) x = self.dropout(x) x = self.fc2(x) return x model = SimpleNet(dropout_rate=0.5) optimizer = torch.optim.SGD(model.parameters(), lr=0.1) criterion = nn.MSELoss() x = torch.randn(20, 10) y = torch.randn(20, 1) model.train() for epoch in range(3): optimizer.zero_grad() output = model(x) loss = criterion(output, y) loss.backward() optimizer.step() print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")
Dropout randomly disables neurons during training. How does this affect training loss?
Dropout prevents the model from relying too much on any one neuron, which can increase training loss slightly but helps the model generalize better and avoid overfitting.
You are training a neural network with L2 regularization (weight decay). What is the effect of setting the weight decay parameter too high?
Think about what happens if the penalty on weights is very strong.
Too high weight decay forces weights to shrink excessively, limiting the model's ability to learn complex patterns, leading to underfitting and poor accuracy.
You train two models on the same data: one with regularization and one without. Both have similar training loss, but the validation loss of the regularized model is lower. What does this indicate?
Think about what validation loss tells us about model performance on new data.
Lower validation loss with similar training loss means the regularized model is better at generalizing and avoids overfitting, while the other model likely memorizes training data.
Look at this PyTorch model training code. Dropout is added but the model still overfits badly. What is the most likely reason?
import torch import torch.nn as nn class Net(nn.Module): def __init__(self): super().__init__() self.fc1 = nn.Linear(20, 50) self.dropout = nn.Dropout(0.5) self.fc2 = nn.Linear(50, 1) def forward(self, x): x = torch.relu(self.fc1(x)) x = self.dropout(x) x = self.fc2(x) return x model = Net() optimizer = torch.optim.Adam(model.parameters(), lr=0.01) criterion = nn.MSELoss() x_train = torch.randn(100, 20) y_train = torch.randn(100, 1) for epoch in range(10): model.train() # <--- Changed from model.eval() to model.train() optimizer.zero_grad() output = model(x_train) loss = criterion(output, y_train) loss.backward() optimizer.step() print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")
Check the mode the model is in during training and how dropout behaves in different modes.
Dropout only works during training mode. Setting model.eval() disables dropout, so the model does not get regularized and overfits.