Imagine you are teaching a robot to recognize apples and oranges. Why do we change the robot's internal settings (weights) during training?
Think about how learning from errors helps improve skills.
Training updates weights so the model reduces errors and improves predictions over time.
Given a simple model and one training step, what is the loss value printed?
import tensorflow as tf # Simple linear model y = wx + b model = tf.keras.Sequential([tf.keras.layers.Dense(1, input_shape=(1,))]) # Mean squared error loss loss_fn = tf.keras.losses.MeanSquaredError() # Optimizer optimizer = tf.keras.optimizers.SGD(learning_rate=0.1) # Input and true output x = tf.constant([[1.0]]) y_true = tf.constant([[2.0]]) with tf.GradientTape() as tape: y_pred = model(x) loss = loss_fn(y_true, y_pred) grads = tape.gradient(loss, model.trainable_variables) optimizer.apply_gradients(zip(grads, model.trainable_variables)) print(round(float(loss), 3))
Check the initial prediction before weight update.
Initial weights are near zero, so prediction is near 0, loss is (2-0)^2 = 4, but mean squared error divides by number of samples (1), so loss is 4.0 before update. After one step, loss printed is before update, so 4.0.
You want to train a model to recognize handwritten digits with many details. Which model type is best to learn complex patterns?
Think about which model can capture complex features.
Deep neural networks with many layers can learn complex patterns better than simple models.
During training, which metric should decrease to show the model is learning better?
Think about what measures error size.
Training loss measures error; it should decrease as the model improves.
Look at this TensorFlow training code. Why do the model weights not change after training?
import tensorflow as tf model = tf.keras.Sequential([tf.keras.layers.Dense(1, input_shape=(1,))]) optimizer = tf.keras.optimizers.SGD(learning_rate=0.1) loss_fn = tf.keras.losses.MeanSquaredError() x = tf.constant([[1.0]]) y_true = tf.constant([[2.0]]) with tf.GradientTape() as tape: y_pred = model(x) loss = loss_fn(y_true, y_pred) gradients = tape.gradient(loss, model.trainable_variables) # Missing optimizer.apply_gradients call here print(model.trainable_variables[0].numpy())
Check if the code applies gradients to update weights.
Without calling optimizer.apply_gradients(), the computed gradients are not used to update weights.