Batch normalization helps improve training by:
- Avoiding vanishing gradients
- Reducing internal covariate shift
- Increasing model capacity
- Replacing dropout
Which is the main reason batch normalization is used?
Think about what happens to the input distribution of layers during training.
Batch normalization normalizes the inputs to each layer to keep their distribution stable, which helps training converge faster and more reliably.
Given the following TensorFlow code, what is the shape of output?
import tensorflow as tf input_tensor = tf.random.normal([32, 28, 28, 3]) bn_layer = tf.keras.layers.BatchNormalization() output = bn_layer(input_tensor) output_shape = output.shape
Batch normalization does not change the shape of the input tensor.
The batch normalization layer normalizes each feature but keeps the input shape unchanged.
In TensorFlow's BatchNormalization layer, the momentum parameter controls the moving average of mean and variance. Which statement about setting momentum is true?
Think about how moving averages work with momentum values close to 1.
Momentum near 1 means the moving averages change slowly, keeping more of the past information. Lower momentum updates faster with new batch statistics.
Consider this code snippet:
import tensorflow as tf input_tensor = tf.random.normal([1, 10]) bn = tf.keras.layers.BatchNormalization() output = bn(input_tensor)
Running this raises an error. Why?
Think about how variance is calculated with only one sample.
With batch size 1, variance calculation is not possible (division by zero), causing batch normalization to fail.
You train two identical neural networks on the same data. One uses batch normalization after each dense layer; the other does not. Which of the following is the most likely difference in training metrics?
Batch normalization stabilizes and speeds up training.
Batch normalization reduces internal covariate shift, allowing faster convergence and often better accuracy.