CrossEntropyLoss vs NLLLoss pytorch

PytorchComparisonBeginner · 4 min read

CrossEntropyLoss vs NLLLoss in PyTorch: Key Differences and Usage

In PyTorch, CrossEntropyLoss combines LogSoftmax and NLLLoss in one step and expects raw scores (logits) as input, while NLLLoss requires log-probabilities as input. Use CrossEntropyLoss for direct logits output from your model, and NLLLoss when you apply LogSoftmax manually before the loss.

⚖️

Quick Comparison

This table summarizes the main differences between CrossEntropyLoss and NLLLoss in PyTorch.

Aspect	CrossEntropyLoss	NLLLoss
Input to loss	Raw logits (unnormalized scores)	Log-probabilities (output of LogSoftmax)
Includes softmax?	No, applies LogSoftmax internally	No, expects log-probabilities already computed
Typical use case	Standard multi-class classification	When you want to apply LogSoftmax separately
Input shape	(N, C) where C = number of classes	(N, C) where C = number of classes
Target format	Class indices (0 to C-1)	Class indices (0 to C-1)
Numerical stability	Handled internally	Depends on manual LogSoftmax

⚖️

Key Differences

CrossEntropyLoss is a convenience loss function that combines LogSoftmax and NLLLoss in one step. This means you can feed it raw model outputs (called logits), and it will apply the log-softmax transformation internally before calculating the negative log likelihood loss.

On the other hand, NLLLoss expects that you have already applied LogSoftmax to your model outputs. It takes log-probabilities as input and computes the negative log likelihood loss directly. This separation allows more flexibility if you want to manipulate probabilities or log-probabilities before loss calculation.

In practice, CrossEntropyLoss is easier and less error-prone because it handles the softmax step internally with better numerical stability. Use NLLLoss only if you need to customize the log-probabilities or combine it with other operations.

⚖️

Code Comparison

python

import torch
import torch.nn as nn

# Sample logits (raw scores) for 3 samples and 4 classes
logits = torch.tensor([[1.0, 2.0, 0.5, 0.1],
                       [0.1, 0.2, 0.3, 0.4],
                       [2.0, 1.0, 0.1, 0.5]])

# Target class indices
targets = torch.tensor([1, 3, 0])

# Using CrossEntropyLoss directly on logits
criterion = nn.CrossEntropyLoss()
loss = criterion(logits, targets)
print(f"CrossEntropyLoss: {loss.item():.4f}")

Output

CrossEntropyLoss: 1.2321

↔️

NLLLoss Equivalent

python

import torch
import torch.nn as nn
import torch.nn.functional as F

# Same logits as before
logits = torch.tensor([[1.0, 2.0, 0.5, 0.1],
                       [0.1, 0.2, 0.3, 0.4],
                       [2.0, 1.0, 0.1, 0.5]])

# Apply LogSoftmax manually
log_probs = F.log_softmax(logits, dim=1)

# Target class indices
targets = torch.tensor([1, 3, 0])

# Using NLLLoss on log-probabilities
criterion = nn.NLLLoss()
loss = criterion(log_probs, targets)
print(f"NLLLoss: {loss.item():.4f}")

Output

NLLLoss: 1.2321

🎯

When to Use Which

Choose CrossEntropyLoss when your model outputs raw scores (logits) and you want a simple, stable loss calculation for multi-class classification. It is the most common and recommended choice for classification tasks.

Choose NLLLoss if you need to apply LogSoftmax manually, for example, when combining with other custom operations or when you want explicit control over the log-probabilities before loss calculation.

In general, prefer CrossEntropyLoss for simplicity and numerical stability unless you have a specific reason to separate the softmax step.

✅

Key Takeaways

CrossEntropyLoss combines LogSoftmax and NLLLoss internally and takes raw logits as input.

NLLLoss requires log-probabilities as input, so you must apply LogSoftmax manually before it.

Use CrossEntropyLoss for standard classification tasks for simplicity and stability.

Use NLLLoss only when you need custom control over log-probabilities.

Both losses expect target labels as class indices, not one-hot vectors.