Imagine you are walking down a hill to reach the lowest point. In gradient descent, what does the learning rate control?
Think about how big or small your steps are when walking down.
The learning rate controls how big each step is when moving towards the minimum. Too big steps can overshoot, too small steps slow progress.
What is the value of w after one gradient descent update?
w = 2.0 learning_rate = 0.1 gradient = 3.0 w = w - learning_rate * gradient print(round(w, 2))
Use the formula: new_w = old_w - learning_rate * gradient
Calculate: 2.0 - 0.1 * 3.0 = 2.0 - 0.3 = 1.7
Which learning rate is most likely to cause the gradient descent to converge smoothly to the minimum?
Too small means slow progress, too large means overshooting.
A learning rate of 0.1 is balanced: not too small to be slow, not too large to overshoot. 1.0 or 10.0 are too big and cause divergence. 0.0001 is very slow.
Which training loss curve shape best represents a too-large learning rate during gradient descent?
Think about what happens if steps are too big and jump around the minimum.
A too-large learning rate causes the loss to jump around the minimum, causing oscillations instead of smooth decrease.
What error will this code produce when running?
w = 1.0 learning_rate = 0.05 gradient = None w = w - learning_rate * gradient print(w)
Check the type of gradient before multiplication.
Multiplying a float by None causes a TypeError because None is not a number.