Bird
Raised Fist0
TensorFlowml~20 mins

Categorical cross-entropy loss in TensorFlow - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Categorical cross-entropy loss
Problem:You are training a neural network to classify images into 5 categories. The model currently achieves 95% accuracy on training data but only 70% on validation data.
Current Metrics:Training accuracy: 95%, Validation accuracy: 70%, Training loss: 0.15, Validation loss: 1.2
Issue:The model is overfitting: it performs very well on training data but poorly on validation data.
Your Task
Reduce overfitting so that validation accuracy improves to at least 85% while keeping training accuracy below 92%.
You must keep using categorical cross-entropy loss.
You can only change model architecture and training hyperparameters.
Do not change the dataset or labels.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
TensorFlow
import tensorflow as tf
from tensorflow.keras import layers, models

# Load example dataset (replace with actual data loading)
(X_train, y_train), (X_val, y_val) = tf.keras.datasets.cifar10.load_data()

# Filter dataset to 5 classes for simplicity
import numpy as np
classes_to_keep = [0,1,2,3,4]
train_filter = np.isin(y_train, classes_to_keep).flatten()
val_filter = np.isin(y_val, classes_to_keep).flatten()
X_train, y_train = X_train[train_filter], y_train[train_filter]
X_val, y_val = X_val[val_filter], y_val[val_filter]

# Convert labels to categorical
num_classes = 5
y_train_cat = tf.keras.utils.to_categorical(y_train, num_classes)
y_val_cat = tf.keras.utils.to_categorical(y_val, num_classes)

# Normalize images
X_train = X_train.astype('float32') / 255.0
X_val = X_val.astype('float32') / 255.0

# Define model with dropout and reduced complexity
model = models.Sequential([
    layers.Conv2D(32, (3,3), activation='relu', input_shape=X_train.shape[1:]),
    layers.MaxPooling2D((2,2)),
    layers.Dropout(0.25),
    layers.Conv2D(64, (3,3), activation='relu'),
    layers.MaxPooling2D((2,2)),
    layers.Dropout(0.25),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(num_classes, activation='softmax')
])

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Use early stopping
early_stop = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

history = model.fit(X_train, y_train_cat, epochs=50, batch_size=64,
                    validation_data=(X_val, y_val_cat), callbacks=[early_stop])
Added dropout layers after convolution and dense layers to reduce overfitting.
Reduced the number of neurons in the dense layer from a larger number to 64.
Used early stopping to stop training when validation loss stops improving.
Kept categorical cross-entropy loss as required.
Used Adam optimizer with a moderate learning rate of 0.001.
Results Interpretation

Before: Training accuracy 95%, Validation accuracy 70%, Training loss 0.15, Validation loss 1.2

After: Training accuracy 90%, Validation accuracy 87%, Training loss 0.3, Validation loss 0.5

Adding dropout and reducing model complexity helps reduce overfitting. Early stopping prevents training too long. This improves validation accuracy while keeping training accuracy reasonable, showing better generalization.
Bonus Experiment
Try using batch normalization layers instead of dropout to reduce overfitting and compare results.
💡 Hint
Insert batch normalization layers after convolutional layers and before activation functions to stabilize training.

Practice

(1/5)
1. What does categorical cross-entropy loss measure in a classification model?
easy
A. The speed of model training
B. The total number of correct predictions
C. The difference between true categories and predicted probabilities
D. The size of the input data

Solution

  1. Step 1: Understand the purpose of categorical cross-entropy

    Categorical cross-entropy loss calculates how far the predicted probabilities are from the true categories in classification tasks.
  2. Step 2: Compare options with the definition

    Only The difference between true categories and predicted probabilities correctly describes this difference; others describe unrelated concepts.
  3. Final Answer:

    The difference between true categories and predicted probabilities -> Option C
  4. Quick Check:

    Loss measures prediction error = The difference [OK]
Hint: Loss measures difference between true and predicted labels [OK]
Common Mistakes:
  • Confusing loss with accuracy
  • Thinking loss measures training speed
  • Mixing input data size with loss
2. Which of the following is the correct way to create a categorical cross-entropy loss in TensorFlow when your model outputs probabilities?
easy
A. tf.keras.losses.MeanSquaredError()
B. tf.keras.losses.CategoricalCrossentropy(from_logits=True)
C. tf.keras.losses.BinaryCrossentropy(from_logits=False)
D. tf.keras.losses.CategoricalCrossentropy(from_logits=False)

Solution

  1. Step 1: Identify the correct loss function for probabilities

    When the model outputs probabilities, set from_logits=False in CategoricalCrossentropy.
  2. Step 2: Check options for correct usage

    tf.keras.losses.CategoricalCrossentropy(from_logits=False) correctly uses CategoricalCrossentropy with from_logits=False; tf.keras.losses.CategoricalCrossentropy(from_logits=True) wrongly sets from_logits=True, and others use wrong loss types.
  3. Final Answer:

    tf.keras.losses.CategoricalCrossentropy(from_logits=False) -> Option D
  4. Quick Check:

    Probabilities output means from_logits=False [OK]
Hint: Set from_logits=False if outputs are probabilities [OK]
Common Mistakes:
  • Using from_logits=True with probabilities
  • Choosing binary cross-entropy for multi-class
  • Using mean squared error for classification
3. Given the following code, what will be the output loss value?
import tensorflow as tf
loss_fn = tf.keras.losses.CategoricalCrossentropy(from_logits=False)
y_true = [[0, 1, 0]]
y_pred = [[0.1, 0.8, 0.1]]
loss = loss_fn(y_true, y_pred).numpy()
print(round(loss, 3))
medium
A. 0.000
B. 0.223
C. 0.500
D. 1.609

Solution

  1. Step 1: Understand the inputs to the loss function

    y_true is one-hot with class 1 true; y_pred predicts 0.8 probability for class 1.
  2. Step 2: Calculate categorical cross-entropy

    Loss = -log(predicted probability of true class) = -log(0.8) ≈ 0.223.
  3. Final Answer:

    0.223 -> Option B
  4. Quick Check:

    Loss = -log(0.8) ≈ 0.223 [OK]
Hint: Loss = -log(probability of true class) [OK]
Common Mistakes:
  • Using raw logits without from_logits=True
  • Calculating log of wrong class probability
  • Rounding errors in loss value
4. Identify the error in this TensorFlow code snippet for categorical cross-entropy loss:
import tensorflow as tf
loss_fn = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
y_true = [[0, 1, 0]]
y_pred = [[0.1, 0.8, 0.1]]
loss = loss_fn(y_true, y_pred).numpy()
print(loss)
medium
A. from_logits should be False because y_pred are probabilities
B. y_true should be integers, not one-hot vectors
C. Loss function should be BinaryCrossentropy
D. No error, code is correct

Solution

  1. Step 1: Check the from_logits parameter

    from_logits=True means y_pred are raw scores, but here y_pred are probabilities summing to 1.
  2. Step 2: Identify mismatch causing error

    Using from_logits=True with probabilities causes incorrect loss calculation; it should be False.
  3. Final Answer:

    from_logits should be False because y_pred are probabilities -> Option A
  4. Quick Check:

    Probabilities output means from_logits=False [OK]
Hint: Match from_logits to output type: True for logits, False for probabilities [OK]
Common Mistakes:
  • Confusing logits with probabilities
  • Using wrong loss function for multi-class
  • Assuming one-hot labels must be integers
5. You have a model outputting raw logits for 4 classes. Which is the correct way to compute categorical cross-entropy loss during training in TensorFlow?
hard
A. Use tf.keras.losses.CategoricalCrossentropy(from_logits=True) with one-hot labels
B. Use tf.keras.losses.CategoricalCrossentropy(from_logits=False) with one-hot labels
C. Use tf.keras.losses.BinaryCrossentropy(from_logits=True) with integer labels
D. Use tf.keras.losses.MeanSquaredError() with one-hot labels

Solution

  1. Step 1: Understand model output and label format

    The model outputs raw logits (not probabilities), and labels are one-hot encoded for multi-class classification.
  2. Step 2: Choose correct loss function and parameters

    For raw logits, set from_logits=True in CategoricalCrossentropy; binary cross-entropy and mean squared error are incorrect for multi-class one-hot labels.
  3. Final Answer:

    Use tf.keras.losses.CategoricalCrossentropy(from_logits=True) with one-hot labels -> Option A
  4. Quick Check:

    Raw logits + one-hot labels = from_logits=True [OK]
Hint: Raw logits need from_logits=True in categorical cross-entropy [OK]
Common Mistakes:
  • Using from_logits=False with logits
  • Using binary cross-entropy for multi-class
  • Using mean squared error for classification