He initialization is often recommended for deep networks with ReLU activations. Why is this the case?
Think about how activation variance affects gradient flow in deep networks.
He initialization scales weights to maintain activation variance through layers, especially with ReLU, which helps avoid vanishing or exploding gradients.
What is the shape of the weights tensor initialized by the following code?
import tensorflow as tf initializer = tf.keras.initializers.GlorotUniform() weights = initializer(shape=(64, 128)) print(weights.shape)
Look at the shape argument passed to the initializer.
The initializer creates a tensor with the exact shape specified, here (64, 128).
Which weight initialization strategy is best suited for a network using sigmoid activation functions to reduce vanishing gradients?
Consider the activation function's output range and how initialization affects gradient flow.
Xavier initialization is designed to keep variance stable for activations like sigmoid and tanh, helping reduce vanishing gradients.
What is the most likely effect on training loss and accuracy if weights are initialized with very large random values?
Think about how large initial weights affect gradient calculations.
Large initial weights cause activations and gradients to explode, making training unstable and accuracy poor.
What error will this TensorFlow custom initializer code raise?
import tensorflow as tf class CustomInit(tf.keras.initializers.Initializer): def __call__(self, shape, dtype=None): return tf.random.uniform(shape, minval=-1, maxval=1, dtype=dtype) initializer = CustomInit() weights = initializer(shape=(32, 32))
tf.random.uniform has a default dtype=tf.float32, so no error occurs.
No error occurs because tf.random.uniform uses dtype=tf.float32 by default when not specified.