How to Fix CUDA Out of Memory Error in NLP Models
CUDA out of memory error happens when your GPU runs out of memory during NLP model training or inference. To fix it, reduce the batch size, use gradient accumulation, or enable mixed precision training with torch.cuda.amp to save memory.Why This Happens
This error occurs because your GPU does not have enough memory to hold all the data and model calculations at once. Large NLP models and big batch sizes use a lot of memory, which can exceed your GPU's capacity.
import torch from transformers import AutoModelForSequenceClassification, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased') model = AutoModelForSequenceClassification.from_pretrained('bert-base-uncased').cuda() inputs = tokenizer(['Hello world!'] * 128, return_tensors='pt', padding=True, truncation=True) inputs = {k: v.cuda() for k, v in inputs.items()} outputs = model(**inputs)
The Fix
Reduce the batch size to lower memory use. Use gradient accumulation to simulate larger batches without extra memory. Enable mixed precision training with torch.cuda.amp to use less memory by storing some numbers in half precision.
import torch from transformers import AutoModelForSequenceClassification, AutoTokenizer # Use smaller batch size batch_size = 16 tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased') model = AutoModelForSequenceClassification.from_pretrained('bert-base-uncased').cuda() inputs = tokenizer(['Hello world!'] * batch_size, return_tensors='pt', padding=True, truncation=True) inputs = {k: v.cuda() for k, v in inputs.items()} # Use mixed precision scaler = torch.cuda.amp.GradScaler() with torch.cuda.amp.autocast(): outputs = model(**inputs) print('Model ran successfully with reduced batch size and mixed precision.')
Prevention
To avoid this error in the future, always monitor GPU memory usage during training. Use smaller batch sizes or gradient accumulation for large datasets. Enable mixed precision training to save memory. Also, clear unused variables and call torch.cuda.empty_cache() to free memory.
Consider using model checkpointing or smaller model versions if memory is limited.
Related Errors
Other common GPU memory errors include:
- RuntimeError: CUDA memory fragmentation - Happens when memory is split into small unusable pieces; restarting the program or clearing cache helps.
- RuntimeError: CUDA device not available - Occurs if GPU drivers or CUDA are not properly installed.
- Out of CPU memory - Happens when system RAM is insufficient; reduce data size or use data loaders with smaller batches.
