How to Fix NaN Loss in PyTorch: Causes and Solutions
torch.isnan() checks.Why This Happens
NaN loss occurs when the model's calculations produce undefined or infinite values. This can happen if your input data contains NaNs or Infs, if your model outputs extreme values causing overflow, or if your loss function receives invalid inputs like zero division or log of zero.
For example, using a log function on zero or negative numbers can cause NaN loss.
import torch import torch.nn as nn # Example causing NaN loss due to log(0) inputs = torch.tensor([[0.0, 0.0], [0.0, 0.0]], requires_grad=True) targets = torch.tensor([1, 0]) loss_fn = nn.NLLLoss() # Log of zero causes -inf, leading to NaN loss log_probs = torch.log(inputs) # log(0) = -inf loss = loss_fn(log_probs, targets) loss.backward()
The Fix
Fix NaN loss by ensuring inputs to functions like torch.log are positive and non-zero. Add a small value (epsilon) to inputs before log to avoid log(0). Also, check your data for NaNs/Infs and clip gradients to prevent exploding values.
import torch import torch.nn as nn # Fix by adding epsilon to inputs before log inputs = torch.tensor([[1e-8, 1e-8], [1e-8, 1e-8]], requires_grad=True) targets = torch.tensor([1, 0]) loss_fn = nn.NLLLoss() log_probs = torch.log(inputs) # inputs already have epsilon, no need to add again loss = loss_fn(log_probs, targets) loss.backward() print(f"Loss value: {loss.item():.6f}")
Prevention
To avoid NaN loss in the future, always preprocess your data to remove or replace NaNs and Infs. Use torch.isnan() and torch.isinf() to check tensors. Apply gradient clipping with torch.nn.utils.clip_grad_norm_ to keep gradients stable. Also, monitor your loss values during training and stop if NaNs appear.
Related Errors
Other common errors related to NaN loss include:
- Inf gradients: Caused by very large values; fix with gradient clipping.
- Division by zero: Happens in custom loss functions; add small epsilon to denominators.
- Invalid label indices: Using wrong target labels in classification losses causes runtime errors.