How to fix cuda out of memory pytorch

PytorchDebug / FixBeginner · 4 min read

How to Fix CUDA Out of Memory Error in PyTorch

The CUDA out of memory error in PyTorch happens when your GPU runs out of memory during model training or inference. To fix it, reduce your batch size, clear unused variables with torch.cuda.empty_cache(), or move some operations to the CPU.

🔍

Why This Happens

This error occurs because your GPU does not have enough memory to hold all the data and model parameters during training or inference. Large batch sizes, big models, or memory leaks cause this problem.

python

import torch

def train():
    model = torch.nn.Linear(1000, 1000).cuda()
    optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
    for _ in range(100):
        inputs = torch.randn(512, 1000).cuda()  # Large batch size
        outputs = model(inputs)
        loss = outputs.sum()
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

train()

Output

RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 4.00 GiB total capacity; 3.00 GiB already allocated; 1.50 GiB free; 3.10 GiB reserved in total by PyTorch)

🔧

The Fix

Reduce the batch size to use less GPU memory. Also, clear unused GPU memory with torch.cuda.empty_cache() after each iteration. This helps PyTorch reuse memory efficiently.

python

import torch

def train_fixed():
    model = torch.nn.Linear(1000, 1000).cuda()
    optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
    for _ in range(100):
        inputs = torch.randn(128, 1000).cuda()  # Smaller batch size
        outputs = model(inputs)
        loss = outputs.sum()
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        torch.cuda.empty_cache()  # Clear unused memory

train_fixed()

Output

No error, training runs successfully.

🛡️

Prevention

To avoid this error in the future, always monitor GPU memory usage during training. Use smaller batch sizes or gradient accumulation if needed. Free unused variables and call optimizer.zero_grad() to clear gradients. Consider using mixed precision training to reduce memory use.

⚠️

Related Errors

Other common GPU memory errors include RuntimeError: CUDA error: device-side assert triggered caused by invalid tensor operations, and cuDNN error: CUDNN_STATUS_ALLOC_FAILED which also relates to memory allocation issues. Fixes often involve checking tensor shapes and reducing memory load.

✅

Key Takeaways

Reduce batch size to lower GPU memory usage and avoid out of memory errors.

Use torch.cuda.empty_cache() to clear unused GPU memory during training loops.

Always call optimizer.zero_grad() to reset gradients and free memory.

Monitor GPU memory usage regularly to catch issues early.

Consider mixed precision training to save memory without losing accuracy.