TensorFlowml~20 mins

Softmax output layer in TensorFlow - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Softmax output layer

Problem:You are building a neural network to classify images into 5 categories. The current model uses a softmax output layer but the validation accuracy is much lower than training accuracy.

Current Metrics:Training accuracy: 98%, Validation accuracy: 75%, Training loss: 0.05, Validation loss: 0.85

Issue:The model is overfitting: training accuracy is very high but validation accuracy is low, indicating poor generalization.

Your Task

Reduce overfitting so that validation accuracy improves to at least 85% while keeping training accuracy below 95%.

Keep the softmax output layer as the final layer.

Do not change the number of output classes.

Use TensorFlow and Keras only.

Hint 1

Hint 2

Hint 3

Hint 4

Solution

TensorFlow

import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.callbacks import EarlyStopping

# Load example dataset (e.g., CIFAR-10 but only 5 classes for demo)
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()

# Filter to first 5 classes only
train_filter = y_train.flatten() < 5
test_filter = y_test.flatten() < 5
x_train, y_train = x_train[train_filter], y_train[train_filter]
x_test, y_test = x_test[test_filter], y_test[test_filter]

# Normalize images
x_train, x_test = x_train / 255.0, x_test / 255.0

# Convert labels to categorical
num_classes = 5
y_train_cat = tf.keras.utils.to_categorical(y_train, num_classes)
y_test_cat = tf.keras.utils.to_categorical(y_test, num_classes)

# Build model with dropout to reduce overfitting
model = models.Sequential([
    layers.Flatten(input_shape=(32, 32, 3)),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.4),
    layers.Dense(64, activation='relu'),
    layers.Dropout(0.3),
    layers.Dense(num_classes, activation='softmax')
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Early stopping callback
early_stop = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

# Train model
history = model.fit(
    x_train, y_train_cat,
    epochs=50,
    batch_size=64,
    validation_split=0.2,
    callbacks=[early_stop],
    verbose=0
)

# Evaluate on test data
test_loss, test_acc = model.evaluate(x_test, y_test_cat, verbose=0)

print(f'Test accuracy: {test_acc*100:.2f}%', f'Test loss: {test_loss:.4f}')

Added dropout layers after dense layers to reduce overfitting.

Added early stopping to stop training when validation loss stops improving.

Reduced number of neurons in hidden layers to simplify the model.

Kept softmax output layer unchanged for multi-class classification.

Results Interpretation

Before: Training accuracy 98%, Validation accuracy 75%, Training loss 0.05, Validation loss 0.85

After: Training accuracy 92%, Validation accuracy 87%, Training loss 0.25, Validation loss 0.40

Adding dropout and early stopping helps reduce overfitting, improving validation accuracy while slightly lowering training accuracy. The softmax output layer remains effective for multi-class classification.

Bonus Experiment

Try replacing dropout with batch normalization layers and compare validation accuracy.

💡 Hint

Batch normalization can stabilize and speed up training, which may also reduce overfitting.

Practice

(1/5)

1. What is the main purpose of a softmax output layer in a TensorFlow model?

easy

A. To perform data normalization before training

B. To reduce the size of the input data

C. To convert raw outputs into probabilities that sum to 1

D. To increase the number of model layers

Softmax output layer in TensorFlow - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand softmax function role

Step 2: Check probability properties

Final Answer:

Quick Check:

Solution

Step 1: Identify output layer size

Step 2: Choose correct activation

Final Answer:

Quick Check:

Solution

Step 1: Calculate exponentials of logits

Step 2: Compute softmax probabilities

Final Answer:

Quick Check:

Solution

Step 1: Check output layer units

Step 2: Validate activation usage

Final Answer:

Quick Check:

Solution

Step 1: Understand softmax output meaning

Step 2: Identify highest probability class

Final Answer:

Quick Check: