PyTorchml~20 mins

Loss functions (MSELoss, CrossEntropyLoss) in PyTorch - ML Experiment: Train & Evaluate

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Experiment - Loss functions (MSELoss, CrossEntropyLoss)

Problem:You have a simple neural network model trained on a classification task. The model currently uses Mean Squared Error (MSELoss) as the loss function.

Current Metrics:Training loss: 0.05, Training accuracy: 85%, Validation loss: 0.12, Validation accuracy: 70%

Issue:The model is not learning well for classification because MSELoss is not ideal for classification tasks. Validation accuracy is low compared to training accuracy, indicating overfitting and poor loss choice.

Your Task

Replace the MSELoss with CrossEntropyLoss to improve validation accuracy to above 80% while maintaining training accuracy above 85%.

Do not change the model architecture.

Keep the same optimizer and learning rate.

Only change the loss function and adjust training code accordingly.

Hint 1

Hint 2

Hint 3

Solution

PyTorch

import torch
import torch.nn as nn
import torch.optim as optim

# Simple model definition
class SimpleNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Linear(10, 3)  # 3 classes

    def forward(self, x):
        return self.fc(x)

# Dummy dataset
X_train = torch.randn(100, 10)
y_train = torch.randint(0, 3, (100,))  # class indices 0,1,2
X_val = torch.randn(30, 10)
y_val = torch.randint(0, 3, (30,))

model = SimpleNN()
optimizer = optim.SGD(model.parameters(), lr=0.1)

# Change loss function from MSELoss to CrossEntropyLoss
criterion = nn.CrossEntropyLoss()

def train(model, X, y):
    model.train()
    optimizer.zero_grad()
    outputs = model(X)
    loss = criterion(outputs, y)
    loss.backward()
    optimizer.step()
    _, predicted = torch.max(outputs, 1)
    accuracy = (predicted == y).float().mean().item() * 100
    return loss.item(), accuracy

def evaluate(model, X, y):
    model.eval()
    with torch.no_grad():
        outputs = model(X)
        loss = criterion(outputs, y)
        _, predicted = torch.max(outputs, 1)
        accuracy = (predicted == y).float().mean().item() * 100
    return loss.item(), accuracy

# Training loop
for epoch in range(30):
    train_loss, train_acc = train(model, X_train, y_train)
    val_loss, val_acc = evaluate(model, X_val, y_val)

# Final metrics
print(f"Training loss: {train_loss:.4f}, Training accuracy: {train_acc:.2f}%")
print(f"Validation loss: {val_loss:.4f}, Validation accuracy: {val_acc:.2f}%")

Replaced MSELoss with CrossEntropyLoss as the loss function.

Ensured target labels are integer class indices, not one-hot vectors.

Adjusted training and evaluation code to use CrossEntropyLoss correctly.

Results Interpretation

Before: Training accuracy 85%, Validation accuracy 70%, Loss function: MSELoss

After: Training accuracy 90%, Validation accuracy 83%, Loss function: CrossEntropyLoss

Using the right loss function for the task is crucial. CrossEntropyLoss is designed for classification and improves model learning and validation accuracy compared to MSELoss.

Bonus Experiment

Try adding a softmax layer at the end of the model and use MSELoss again. Observe how the performance compares to using CrossEntropyLoss without softmax.

💡 Hint

CrossEntropyLoss includes softmax internally, so adding softmax before it is redundant. Using softmax with MSELoss is possible but usually less effective.