Bird
Raised Fist0
TensorFlowml~8 mins

Activation functions (ReLU, sigmoid, softmax) in TensorFlow - Model Metrics & Evaluation

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Metrics & Evaluation - Activation functions (ReLU, sigmoid, softmax)
Which metric matters for Activation Functions and WHY

Activation functions like ReLU, sigmoid, and softmax help a model learn complex patterns by deciding how much signal passes through each neuron.

To evaluate models using these activations, we focus on metrics like accuracy for classification tasks, cross-entropy loss to measure prediction quality, and probability calibration especially with softmax outputs.

Why? Because activation functions shape the output values, which affect how well the model predicts classes or probabilities.

Confusion Matrix Example

For a classification model using softmax activation on the output layer, the confusion matrix shows how many predictions were correct or wrong:

      Actual \ Predicted |  Class A  |  Class B  |  Class C
      -------------------|----------|----------|---------
      Class A            |    50    |    2     |    3
      Class B            |    4     |   45     |    1
      Class C            |    2     |    3     |   48
    

This matrix helps calculate accuracy, precision, and recall for each class.

Precision vs Recall Tradeoff with Activation Functions

Activation functions influence model outputs and thus affect precision and recall.

Example: Using sigmoid activation for a binary classifier, adjusting the decision threshold changes precision and recall.

  • Lower threshold: more positives predicted, higher recall but lower precision.
  • Higher threshold: fewer positives predicted, higher precision but lower recall.

Softmax outputs probabilities for multiple classes, so picking the class with highest probability balances precision and recall.

Good vs Bad Metric Values for Models Using These Activations

Good:

  • High accuracy (e.g., > 85%) on test data.
  • Low cross-entropy loss indicating confident correct predictions.
  • Balanced precision and recall, avoiding bias toward false positives or negatives.

Bad:

  • Accuracy near random chance (e.g., ~33% for 3 classes with softmax).
  • High loss showing poor prediction confidence.
  • Very low recall or precision indicating the model misses many positives or makes many false alarms.
Common Pitfalls in Metrics with Activation Functions
  • Ignoring threshold effects: Sigmoid outputs need threshold tuning; default 0.5 may not be best.
  • Overconfidence: Softmax can produce high probabilities even when wrong, misleading accuracy.
  • Vanishing gradients: Sigmoid can cause slow learning in deep networks, hurting final metrics.
  • Not checking calibration: Probabilities from activations may not reflect true likelihoods.
Self Check

Your model uses sigmoid activation for binary classification. It has 98% accuracy but only 12% recall on the positive class. Is it good for production?

Answer: No. The model misses most positive cases (low recall), which can be critical depending on the task (e.g., disease detection). High accuracy is misleading because negatives dominate.

Key Result
Activation functions shape outputs that affect accuracy, loss, precision, and recall; understanding their impact helps evaluate model quality correctly.

Practice

(1/5)
1. Which activation function is best suited for hidden layers in a neural network to keep only positive signals?
easy
A. ReLU
B. Sigmoid
C. Softmax
D. Linear

Solution

  1. Step 1: Understand the role of activation functions in hidden layers

    Hidden layers need non-linear functions that allow positive values to pass and block negative ones to help learning complex patterns.
  2. Step 2: Identify which function keeps positive signals

    ReLU (Rectified Linear Unit) outputs zero for negative inputs and passes positive inputs unchanged, making it ideal for hidden layers.
  3. Final Answer:

    ReLU -> Option A
  4. Quick Check:

    Hidden layers use ReLU = C [OK]
Hint: ReLU blocks negatives, perfect for hidden layers [OK]
Common Mistakes:
  • Confusing sigmoid as best for hidden layers
  • Thinking softmax works for hidden layers
  • Assuming linear activation adds non-linearity
2. Which of the following is the correct way to apply the sigmoid activation function in TensorFlow?
easy
A. tf.nn.relu(x)
B. tf.nn.sigmoid(x)
C. tf.sigmoid(x)
D. tf.activation.sigmoid(x)

Solution

  1. Step 1: Recall TensorFlow activation function syntax

    TensorFlow provides activation functions under tf.nn module, so sigmoid is tf.nn.sigmoid.
  2. Step 2: Check each option for correct syntax

    tf.nn.sigmoid(x) uses tf.nn.sigmoid(x), which is the correct function call. Others are invalid or do not exist.
  3. Final Answer:

    tf.nn.sigmoid(x) -> Option B
  4. Quick Check:

    Sigmoid in TensorFlow = tf.nn.sigmoid(x) [OK]
Hint: TensorFlow activations are in tf.nn module [OK]
Common Mistakes:
  • Using tf.sigmoid instead of tf.nn.sigmoid
  • Confusing ReLU with sigmoid function
  • Trying to call activation from tf.activation
3. What will be the output of the following code snippet?
import tensorflow as tf
x = tf.constant([-1.0, 0.0, 1.0, 2.0])
output = tf.nn.relu(x)
print(output.numpy())
medium
A. [0.5 0.5 0.5 0.5]
B. [-1. 0. 1. 2.]
C. [1. 1. 1. 1.]
D. [0. 0. 1. 2.]

Solution

  1. Step 1: Understand ReLU behavior on input tensor

    ReLU outputs zero for negative inputs and passes positive inputs unchanged.
  2. Step 2: Apply ReLU to each element in x

    -1.0 becomes 0.0, 0.0 stays 0.0, 1.0 stays 1.0, 2.0 stays 2.0.
  3. Final Answer:

    [0. 0. 1. 2.] -> Option D
  4. Quick Check:

    ReLU([-1,0,1,2]) = [0,0,1,2] [OK]
Hint: ReLU clips negatives to zero, keeps positives [OK]
Common Mistakes:
  • Expecting negative values to remain
  • Confusing ReLU with sigmoid output
  • Assuming output is all ones
4. Identify the error in the following TensorFlow code that applies softmax activation:
import tensorflow as tf
x = tf.constant([2.0, 1.0, 0.1])
output = tf.nn.softmax(x, axis=1)
print(output.numpy())
medium
A. The axis parameter should be 0 or -1 for this tensor
B. Softmax cannot be applied to 1D tensors
C. The axis parameter should be omitted
D. The axis parameter should be 0 instead of 1

Solution

  1. Step 1: Check the shape of input tensor x

    x is a 1D tensor with shape (3,), so valid axis values are 0 or -1.
  2. Step 2: Understand axis parameter in softmax

    Axis=1 is invalid for 1D tensor because axis 1 does not exist; axis must be 0 or -1.
  3. Final Answer:

    The axis parameter should be 0 or -1 for this tensor -> Option A
  4. Quick Check:

    Softmax axis for 1D tensor = 0 or -1 [OK]
Hint: Axis must exist in tensor shape for softmax [OK]
Common Mistakes:
  • Using axis=1 on 1D tensor causes error
  • Thinking softmax can't apply to 1D tensors
  • Omitting axis but expecting default to work
5. You want to build a neural network for multi-class classification with 4 classes. Which activation function should you use in the output layer to get probabilities for each class?
hard
A. ReLU
B. Sigmoid
C. Softmax
D. Tanh

Solution

  1. Step 1: Understand output layer needs for multi-class classification

    Output layer must produce probabilities that sum to 1 across all classes.
  2. Step 2: Identify activation function that outputs class probabilities

    Softmax converts raw scores into probabilities summing to 1, perfect for multi-class outputs.
  3. Final Answer:

    Softmax -> Option C
  4. Quick Check:

    Multi-class output uses Softmax = B [OK]
Hint: Softmax outputs probabilities summing to 1 [OK]
Common Mistakes:
  • Using sigmoid for multi-class instead of softmax
  • Choosing ReLU which doesn't output probabilities
  • Confusing tanh with probability output