0
0
PytorchComparisonBeginner · 4 min read

CrossEntropyLoss vs NLLLoss in PyTorch: Key Differences and Usage

In PyTorch, CrossEntropyLoss combines LogSoftmax and NLLLoss in one step and expects raw scores (logits) as input, while NLLLoss requires log-probabilities as input. Use CrossEntropyLoss for direct logits output from your model, and NLLLoss when you apply LogSoftmax manually before the loss.
⚖️

Quick Comparison

This table summarizes the main differences between CrossEntropyLoss and NLLLoss in PyTorch.

AspectCrossEntropyLossNLLLoss
Input to lossRaw logits (unnormalized scores)Log-probabilities (output of LogSoftmax)
Includes softmax?No, applies LogSoftmax internallyNo, expects log-probabilities already computed
Typical use caseStandard multi-class classificationWhen you want to apply LogSoftmax separately
Input shape(N, C) where C = number of classes(N, C) where C = number of classes
Target formatClass indices (0 to C-1)Class indices (0 to C-1)
Numerical stabilityHandled internallyDepends on manual LogSoftmax
⚖️

Key Differences

CrossEntropyLoss is a convenience loss function that combines LogSoftmax and NLLLoss in one step. This means you can feed it raw model outputs (called logits), and it will apply the log-softmax transformation internally before calculating the negative log likelihood loss.

On the other hand, NLLLoss expects that you have already applied LogSoftmax to your model outputs. It takes log-probabilities as input and computes the negative log likelihood loss directly. This separation allows more flexibility if you want to manipulate probabilities or log-probabilities before loss calculation.

In practice, CrossEntropyLoss is easier and less error-prone because it handles the softmax step internally with better numerical stability. Use NLLLoss only if you need to customize the log-probabilities or combine it with other operations.

⚖️

Code Comparison

python
import torch
import torch.nn as nn

# Sample logits (raw scores) for 3 samples and 4 classes
logits = torch.tensor([[1.0, 2.0, 0.5, 0.1],
                       [0.1, 0.2, 0.3, 0.4],
                       [2.0, 1.0, 0.1, 0.5]])

# Target class indices
targets = torch.tensor([1, 3, 0])

# Using CrossEntropyLoss directly on logits
criterion = nn.CrossEntropyLoss()
loss = criterion(logits, targets)
print(f"CrossEntropyLoss: {loss.item():.4f}")
Output
CrossEntropyLoss: 1.2321
↔️

NLLLoss Equivalent

python
import torch
import torch.nn as nn
import torch.nn.functional as F

# Same logits as before
logits = torch.tensor([[1.0, 2.0, 0.5, 0.1],
                       [0.1, 0.2, 0.3, 0.4],
                       [2.0, 1.0, 0.1, 0.5]])

# Apply LogSoftmax manually
log_probs = F.log_softmax(logits, dim=1)

# Target class indices
targets = torch.tensor([1, 3, 0])

# Using NLLLoss on log-probabilities
criterion = nn.NLLLoss()
loss = criterion(log_probs, targets)
print(f"NLLLoss: {loss.item():.4f}")
Output
NLLLoss: 1.2321
🎯

When to Use Which

Choose CrossEntropyLoss when your model outputs raw scores (logits) and you want a simple, stable loss calculation for multi-class classification. It is the most common and recommended choice for classification tasks.

Choose NLLLoss if you need to apply LogSoftmax manually, for example, when combining with other custom operations or when you want explicit control over the log-probabilities before loss calculation.

In general, prefer CrossEntropyLoss for simplicity and numerical stability unless you have a specific reason to separate the softmax step.

Key Takeaways

CrossEntropyLoss combines LogSoftmax and NLLLoss internally and takes raw logits as input.
NLLLoss requires log-probabilities as input, so you must apply LogSoftmax manually before it.
Use CrossEntropyLoss for standard classification tasks for simplicity and stability.
Use NLLLoss only when you need custom control over log-probabilities.
Both losses expect target labels as class indices, not one-hot vectors.