TensorFlowml~20 mins

Activation functions (ReLU, sigmoid, softmax) in TensorFlow - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Activation functions (ReLU, sigmoid, softmax)

Problem:You have a neural network model for classifying handwritten digits (0-9) using the MNIST dataset. The model currently uses sigmoid activation in all layers.

Current Metrics:Training accuracy: 92%, Validation accuracy: 85%, Training loss: 0.25, Validation loss: 0.40

Issue:The model trains but validation accuracy is lower than training accuracy, indicating some overfitting and slow learning. Sigmoid activation causes vanishing gradients in deeper layers.

Your Task

Improve validation accuracy to above 90% by changing activation functions to reduce vanishing gradients and improve learning.

Keep the same model architecture (number of layers and units).

Only change activation functions in the hidden layers and output layer.

Use TensorFlow and Keras API.

Hint 1

Hint 2

Hint 3

Solution

TensorFlow

import tensorflow as tf
from tensorflow.keras import layers, models

# Load MNIST dataset
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()

# Normalize pixel values
X_train = X_train.reshape(-1, 28*28) / 255.0
X_test = X_test.reshape(-1, 28*28) / 255.0

# Build model with ReLU and softmax activations
model = models.Sequential([
    layers.Dense(128, activation='relu', input_shape=(28*28,)),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')
])

# Compile model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train model
history = model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2)

# Evaluate on test data
test_loss, test_acc = model.evaluate(X_test, y_test)

print(f"Test accuracy: {test_acc:.2f}")

Replaced sigmoid activation in hidden layers with ReLU to reduce vanishing gradients and speed up learning.

Changed output layer activation from sigmoid to softmax for proper multi-class probability output.

Kept model architecture same to isolate effect of activation functions.

Results Interpretation

Before: Training accuracy 92%, Validation accuracy 85%, Loss higher on validation.

After: Training accuracy 97%, Validation accuracy 92%, Loss decreased on validation.

Using ReLU in hidden layers helps the model learn faster and reduces vanishing gradients. Softmax in output layer correctly models multi-class probabilities, improving accuracy.

Bonus Experiment

Try adding dropout layers after each hidden layer to reduce overfitting and see if validation accuracy improves further.

💡 Hint

Add layers.Dropout(0.3) after each Dense layer with ReLU activation and retrain the model.

Practice

(1/5)

1. Which activation function is best suited for hidden layers in a neural network to keep only positive signals?

easy

A. ReLU

B. Sigmoid

C. Softmax

D. Linear

Activation functions (ReLU, sigmoid, softmax) in TensorFlow - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of activation functions in hidden layers

Step 2: Identify which function keeps positive signals

Final Answer:

Quick Check:

Solution

Step 1: Recall TensorFlow activation function syntax

Step 2: Check each option for correct syntax

Final Answer:

Quick Check:

Solution

Step 1: Understand ReLU behavior on input tensor

Step 2: Apply ReLU to each element in x

Final Answer:

Quick Check:

Solution

Step 1: Check the shape of input tensor x

Step 2: Understand axis parameter in softmax

Final Answer:

Quick Check:

Solution

Step 1: Understand output layer needs for multi-class classification

Step 2: Identify activation function that outputs class probabilities

Final Answer:

Quick Check: