0
0
TensorFlowml~20 mins

Categorical cross-entropy loss in TensorFlow - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Categorical cross-entropy loss
Problem:You are training a neural network to classify images into 5 categories. The model currently achieves 95% accuracy on training data but only 70% on validation data.
Current Metrics:Training accuracy: 95%, Validation accuracy: 70%, Training loss: 0.15, Validation loss: 1.2
Issue:The model is overfitting: it performs very well on training data but poorly on validation data.
Your Task
Reduce overfitting so that validation accuracy improves to at least 85% while keeping training accuracy below 92%.
You must keep using categorical cross-entropy loss.
You can only change model architecture and training hyperparameters.
Do not change the dataset or labels.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
TensorFlow
import tensorflow as tf
from tensorflow.keras import layers, models

# Load example dataset (replace with actual data loading)
(X_train, y_train), (X_val, y_val) = tf.keras.datasets.cifar10.load_data()

# Filter dataset to 5 classes for simplicity
import numpy as np
classes_to_keep = [0,1,2,3,4]
train_filter = np.isin(y_train, classes_to_keep).flatten()
val_filter = np.isin(y_val, classes_to_keep).flatten()
X_train, y_train = X_train[train_filter], y_train[train_filter]
X_val, y_val = X_val[val_filter], y_val[val_filter]

# Convert labels to categorical
num_classes = 5
y_train_cat = tf.keras.utils.to_categorical(y_train, num_classes)
y_val_cat = tf.keras.utils.to_categorical(y_val, num_classes)

# Normalize images
X_train = X_train.astype('float32') / 255.0
X_val = X_val.astype('float32') / 255.0

# Define model with dropout and reduced complexity
model = models.Sequential([
    layers.Conv2D(32, (3,3), activation='relu', input_shape=X_train.shape[1:]),
    layers.MaxPooling2D((2,2)),
    layers.Dropout(0.25),
    layers.Conv2D(64, (3,3), activation='relu'),
    layers.MaxPooling2D((2,2)),
    layers.Dropout(0.25),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(num_classes, activation='softmax')
])

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Use early stopping
early_stop = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

history = model.fit(X_train, y_train_cat, epochs=50, batch_size=64,
                    validation_data=(X_val, y_val_cat), callbacks=[early_stop])
Added dropout layers after convolution and dense layers to reduce overfitting.
Reduced the number of neurons in the dense layer from a larger number to 64.
Used early stopping to stop training when validation loss stops improving.
Kept categorical cross-entropy loss as required.
Used Adam optimizer with a moderate learning rate of 0.001.
Results Interpretation

Before: Training accuracy 95%, Validation accuracy 70%, Training loss 0.15, Validation loss 1.2

After: Training accuracy 90%, Validation accuracy 87%, Training loss 0.3, Validation loss 0.5

Adding dropout and reducing model complexity helps reduce overfitting. Early stopping prevents training too long. This improves validation accuracy while keeping training accuracy reasonable, showing better generalization.
Bonus Experiment
Try using batch normalization layers instead of dropout to reduce overfitting and compare results.
💡 Hint
Insert batch normalization layers after convolutional layers and before activation functions to stabilize training.