Bird
Raised Fist0
TensorFlowml~20 mins

Activation functions (ReLU, sigmoid, softmax) in TensorFlow - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Activation functions (ReLU, sigmoid, softmax)
Problem:You have a neural network model for classifying handwritten digits (0-9) using the MNIST dataset. The model currently uses sigmoid activation in all layers.
Current Metrics:Training accuracy: 92%, Validation accuracy: 85%, Training loss: 0.25, Validation loss: 0.40
Issue:The model trains but validation accuracy is lower than training accuracy, indicating some overfitting and slow learning. Sigmoid activation causes vanishing gradients in deeper layers.
Your Task
Improve validation accuracy to above 90% by changing activation functions to reduce vanishing gradients and improve learning.
Keep the same model architecture (number of layers and units).
Only change activation functions in the hidden layers and output layer.
Use TensorFlow and Keras API.
Hint 1
Hint 2
Hint 3
Solution
TensorFlow
import tensorflow as tf
from tensorflow.keras import layers, models

# Load MNIST dataset
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()

# Normalize pixel values
X_train = X_train.reshape(-1, 28*28) / 255.0
X_test = X_test.reshape(-1, 28*28) / 255.0

# Build model with ReLU and softmax activations
model = models.Sequential([
    layers.Dense(128, activation='relu', input_shape=(28*28,)),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')
])

# Compile model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train model
history = model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2)

# Evaluate on test data
test_loss, test_acc = model.evaluate(X_test, y_test)

print(f"Test accuracy: {test_acc:.2f}")
Replaced sigmoid activation in hidden layers with ReLU to reduce vanishing gradients and speed up learning.
Changed output layer activation from sigmoid to softmax for proper multi-class probability output.
Kept model architecture same to isolate effect of activation functions.
Results Interpretation

Before: Training accuracy 92%, Validation accuracy 85%, Loss higher on validation.

After: Training accuracy 97%, Validation accuracy 92%, Loss decreased on validation.

Using ReLU in hidden layers helps the model learn faster and reduces vanishing gradients. Softmax in output layer correctly models multi-class probabilities, improving accuracy.
Bonus Experiment
Try adding dropout layers after each hidden layer to reduce overfitting and see if validation accuracy improves further.
💡 Hint
Add layers.Dropout(0.3) after each Dense layer with ReLU activation and retrain the model.

Practice

(1/5)
1. Which activation function is best suited for hidden layers in a neural network to keep only positive signals?
easy
A. ReLU
B. Sigmoid
C. Softmax
D. Linear

Solution

  1. Step 1: Understand the role of activation functions in hidden layers

    Hidden layers need non-linear functions that allow positive values to pass and block negative ones to help learning complex patterns.
  2. Step 2: Identify which function keeps positive signals

    ReLU (Rectified Linear Unit) outputs zero for negative inputs and passes positive inputs unchanged, making it ideal for hidden layers.
  3. Final Answer:

    ReLU -> Option A
  4. Quick Check:

    Hidden layers use ReLU = C [OK]
Hint: ReLU blocks negatives, perfect for hidden layers [OK]
Common Mistakes:
  • Confusing sigmoid as best for hidden layers
  • Thinking softmax works for hidden layers
  • Assuming linear activation adds non-linearity
2. Which of the following is the correct way to apply the sigmoid activation function in TensorFlow?
easy
A. tf.nn.relu(x)
B. tf.nn.sigmoid(x)
C. tf.sigmoid(x)
D. tf.activation.sigmoid(x)

Solution

  1. Step 1: Recall TensorFlow activation function syntax

    TensorFlow provides activation functions under tf.nn module, so sigmoid is tf.nn.sigmoid.
  2. Step 2: Check each option for correct syntax

    tf.nn.sigmoid(x) uses tf.nn.sigmoid(x), which is the correct function call. Others are invalid or do not exist.
  3. Final Answer:

    tf.nn.sigmoid(x) -> Option B
  4. Quick Check:

    Sigmoid in TensorFlow = tf.nn.sigmoid(x) [OK]
Hint: TensorFlow activations are in tf.nn module [OK]
Common Mistakes:
  • Using tf.sigmoid instead of tf.nn.sigmoid
  • Confusing ReLU with sigmoid function
  • Trying to call activation from tf.activation
3. What will be the output of the following code snippet?
import tensorflow as tf
x = tf.constant([-1.0, 0.0, 1.0, 2.0])
output = tf.nn.relu(x)
print(output.numpy())
medium
A. [0.5 0.5 0.5 0.5]
B. [-1. 0. 1. 2.]
C. [1. 1. 1. 1.]
D. [0. 0. 1. 2.]

Solution

  1. Step 1: Understand ReLU behavior on input tensor

    ReLU outputs zero for negative inputs and passes positive inputs unchanged.
  2. Step 2: Apply ReLU to each element in x

    -1.0 becomes 0.0, 0.0 stays 0.0, 1.0 stays 1.0, 2.0 stays 2.0.
  3. Final Answer:

    [0. 0. 1. 2.] -> Option D
  4. Quick Check:

    ReLU([-1,0,1,2]) = [0,0,1,2] [OK]
Hint: ReLU clips negatives to zero, keeps positives [OK]
Common Mistakes:
  • Expecting negative values to remain
  • Confusing ReLU with sigmoid output
  • Assuming output is all ones
4. Identify the error in the following TensorFlow code that applies softmax activation:
import tensorflow as tf
x = tf.constant([2.0, 1.0, 0.1])
output = tf.nn.softmax(x, axis=1)
print(output.numpy())
medium
A. The axis parameter should be 0 or -1 for this tensor
B. Softmax cannot be applied to 1D tensors
C. The axis parameter should be omitted
D. The axis parameter should be 0 instead of 1

Solution

  1. Step 1: Check the shape of input tensor x

    x is a 1D tensor with shape (3,), so valid axis values are 0 or -1.
  2. Step 2: Understand axis parameter in softmax

    Axis=1 is invalid for 1D tensor because axis 1 does not exist; axis must be 0 or -1.
  3. Final Answer:

    The axis parameter should be 0 or -1 for this tensor -> Option A
  4. Quick Check:

    Softmax axis for 1D tensor = 0 or -1 [OK]
Hint: Axis must exist in tensor shape for softmax [OK]
Common Mistakes:
  • Using axis=1 on 1D tensor causes error
  • Thinking softmax can't apply to 1D tensors
  • Omitting axis but expecting default to work
5. You want to build a neural network for multi-class classification with 4 classes. Which activation function should you use in the output layer to get probabilities for each class?
hard
A. ReLU
B. Sigmoid
C. Softmax
D. Tanh

Solution

  1. Step 1: Understand output layer needs for multi-class classification

    Output layer must produce probabilities that sum to 1 across all classes.
  2. Step 2: Identify activation function that outputs class probabilities

    Softmax converts raw scores into probabilities summing to 1, perfect for multi-class outputs.
  3. Final Answer:

    Softmax -> Option C
  4. Quick Check:

    Multi-class output uses Softmax = B [OK]
Hint: Softmax outputs probabilities summing to 1 [OK]
Common Mistakes:
  • Using sigmoid for multi-class instead of softmax
  • Choosing ReLU which doesn't output probabilities
  • Confusing tanh with probability output