Imagine you are training a model using gradient descent. What happens if the learning rate is set too high?
Think about what happens if you take very large steps downhill on a hill.
A high learning rate causes the model to jump over the minimum point repeatedly, preventing convergence and causing unstable training.
Consider the following PyTorch training loop snippet. What will be the printed loss trend if the learning rate is set too low?
import torch import torch.nn as nn import torch.optim as optim model = nn.Linear(1, 1) criterion = nn.MSELoss() optimizer = optim.SGD(model.parameters(), lr=0.00001) x = torch.tensor([[1.0], [2.0], [3.0]]) y = torch.tensor([[2.0], [4.0], [6.0]]) for epoch in range(5): optimizer.zero_grad() outputs = model(x) loss = criterion(outputs, y) loss.backward() optimizer.step() print(f"Epoch {epoch+1}, Loss: {loss.item():.6f}")
Think about what happens if the steps taken to minimize loss are very small.
A very low learning rate causes very small updates, so the loss decreases very slowly or stays almost the same in a few epochs.
You want your model to converge faster and avoid getting stuck in local minima. Which learning rate strategy below is best suited?
Think about starting with bigger steps and then taking smaller steps as you get closer to the goal.
Gradually decreasing the learning rate helps the model take big steps early on and fine-tune weights later, improving convergence and avoiding local minima.
Which learning rate value is most likely to cause unstable training and divergence when training a neural network?
Consider typical learning rates used in practice and what happens if the rate is too large.
A learning rate of 0.1 is often too large for many models, causing weight updates to overshoot and training to diverge.
During training, you observe the following loss values over epochs with a learning rate scheduler that reduces the rate every 3 epochs:
Epoch 1: 0.8 Epoch 2: 0.6 Epoch 3: 0.5 Epoch 4: 0.48 Epoch 5: 0.45 Epoch 6: 0.44
What does this pattern suggest about the effect of the learning rate scheduler on convergence?
Look at how the loss changes before and after epoch 3 when the learning rate changes.
The loss decreases faster initially with a higher learning rate, then slows down as the learning rate reduces, which helps fine-tune the model and stabilize convergence.