0
0
TensorFlowml~5 mins

Weight initialization strategies in TensorFlow

Choose your learning style9 modes available
Introduction

Weight initialization helps the model learn better and faster by starting with good values for its weights.

When building a neural network to avoid slow or stuck learning.
When training deep models to prevent problems like vanishing or exploding gradients.
When experimenting with different model architectures to improve accuracy.
When using activation functions like ReLU or sigmoid that need special care in initialization.
Syntax
TensorFlow
initializer = tf.keras.initializers.HeNormal()
layer = tf.keras.layers.Dense(units=64, activation='relu', kernel_initializer=initializer)

You choose an initializer and pass it to the layer's kernel_initializer argument.

Common initializers include GlorotUniform, HeNormal, and RandomNormal.

Examples
GlorotUniform is good for layers with sigmoid or tanh activations.
TensorFlow
initializer = tf.keras.initializers.GlorotUniform()
layer = tf.keras.layers.Dense(32, activation='relu', kernel_initializer=initializer)
HeNormal works well with ReLU activations to keep gradients stable.
TensorFlow
initializer = tf.keras.initializers.HeNormal()
layer = tf.keras.layers.Dense(64, activation='relu', kernel_initializer=initializer)
RandomNormal initializes weights with small random values from a normal distribution.
TensorFlow
initializer = tf.keras.initializers.RandomNormal(mean=0., stddev=0.05)
layer = tf.keras.layers.Dense(10, activation='softmax', kernel_initializer=initializer)
Sample Model

This code builds a small neural network using the HeNormal initializer for weights. It trains on random data for 3 epochs and then makes predictions on new random inputs.

TensorFlow
import tensorflow as tf
from tensorflow.keras import layers, models

# Create a simple model with HeNormal initializer
initializer = tf.keras.initializers.HeNormal()

model = models.Sequential([
    layers.Dense(64, activation='relu', kernel_initializer=initializer, input_shape=(20,)),
    layers.Dense(10, activation='softmax', kernel_initializer=initializer)
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Generate dummy data
import numpy as np
x_train = np.random.rand(100, 20).astype('float32')
y_train = np.random.randint(0, 10, size=(100,))

# Train the model
history = model.fit(x_train, y_train, epochs=3, batch_size=10, verbose=2)

# Make predictions on new data
x_test = np.random.rand(5, 20).astype('float32')
predictions = model.predict(x_test)

print('Predictions shape:', predictions.shape)
print('First prediction:', predictions[0])
OutputSuccess
Important Notes

Choosing the right initializer can help your model train faster and avoid common problems.

He initialization is best for ReLU activations, while Glorot (Xavier) is good for sigmoid or tanh.

Always set the initializer when creating layers to control how weights start.

Summary

Weight initialization sets starting values for model weights to help learning.

Use HeNormal for ReLU and GlorotUniform for sigmoid/tanh activations.

Proper initialization prevents slow or stuck training and improves accuracy.