0
0
TensorFlowml~20 mins

Activation functions (ReLU, sigmoid, softmax) in TensorFlow - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Activation functions (ReLU, sigmoid, softmax)
Problem:You have a neural network model for classifying handwritten digits (0-9) using the MNIST dataset. The model currently uses sigmoid activation in all layers.
Current Metrics:Training accuracy: 92%, Validation accuracy: 85%, Training loss: 0.25, Validation loss: 0.40
Issue:The model trains but validation accuracy is lower than training accuracy, indicating some overfitting and slow learning. Sigmoid activation causes vanishing gradients in deeper layers.
Your Task
Improve validation accuracy to above 90% by changing activation functions to reduce vanishing gradients and improve learning.
Keep the same model architecture (number of layers and units).
Only change activation functions in the hidden layers and output layer.
Use TensorFlow and Keras API.
Hint 1
Hint 2
Hint 3
Solution
TensorFlow
import tensorflow as tf
from tensorflow.keras import layers, models

# Load MNIST dataset
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()

# Normalize pixel values
X_train = X_train.reshape(-1, 28*28) / 255.0
X_test = X_test.reshape(-1, 28*28) / 255.0

# Build model with ReLU and softmax activations
model = models.Sequential([
    layers.Dense(128, activation='relu', input_shape=(28*28,)),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')
])

# Compile model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train model
history = model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2)

# Evaluate on test data
test_loss, test_acc = model.evaluate(X_test, y_test)

print(f"Test accuracy: {test_acc:.2f}")
Replaced sigmoid activation in hidden layers with ReLU to reduce vanishing gradients and speed up learning.
Changed output layer activation from sigmoid to softmax for proper multi-class probability output.
Kept model architecture same to isolate effect of activation functions.
Results Interpretation

Before: Training accuracy 92%, Validation accuracy 85%, Loss higher on validation.

After: Training accuracy 97%, Validation accuracy 92%, Loss decreased on validation.

Using ReLU in hidden layers helps the model learn faster and reduces vanishing gradients. Softmax in output layer correctly models multi-class probabilities, improving accuracy.
Bonus Experiment
Try adding dropout layers after each hidden layer to reduce overfitting and see if validation accuracy improves further.
💡 Hint
Add layers.Dropout(0.3) after each Dense layer with ReLU activation and retrain the model.