What is torch.cuda.amp in PyTorch: Automatic Mixed Precision Explained
torch.cuda.amp is a module for automatic mixed precision (AMP) that helps speed up training and reduce GPU memory use by mixing 16-bit and 32-bit floating point operations. It automatically manages when to use lower precision for faster computation while keeping model accuracy stable.How It Works
torch.cuda.amp works like a smart assistant that decides when to use faster, smaller 16-bit numbers (called half precision) and when to use the usual 32-bit numbers (full precision) during training. This mix helps the computer run calculations faster and use less memory, similar to how you might use a shortcut for simple math but switch to full calculations for tricky parts.
It uses two main tools: a GradScaler that scales up small numbers to avoid errors when using 16-bit, and an autocast context that automatically picks the right precision for each operation. This way, you don’t have to change your model code much, but you get faster training and less memory use.
Example
This example shows how to use torch.cuda.amp to train a simple model with automatic mixed precision.
import torch import torch.nn as nn import torch.optim as optim from torch.cuda.amp import autocast, GradScaler # Simple model model = nn.Linear(10, 1).cuda() optimizer = optim.SGD(model.parameters(), lr=0.01) loss_fn = nn.MSELoss() # Data inputs = torch.randn(16, 10).cuda() targets = torch.randn(16, 1).cuda() # Create GradScaler scaler = GradScaler() model.train() for epoch in range(3): optimizer.zero_grad() with autocast(): outputs = model(inputs) loss = loss_fn(outputs, targets) # Scale loss and backpropagate scaler.scale(loss).backward() scaler.step(optimizer) scaler.update() print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")
When to Use
Use torch.cuda.amp when training deep learning models on NVIDIA GPUs to speed up training and reduce memory use without losing accuracy. It is especially helpful for large models or datasets where training time and GPU memory are limited.
Common use cases include training convolutional neural networks for image tasks, transformers for language tasks, or any model where faster training can save time and cost. It works best on GPUs with Tensor Cores designed for mixed precision.
Key Points
torch.cuda.ampenables automatic mixed precision to speed up training.- It mixes 16-bit and 32-bit floating point operations safely.
- Use
autocastto wrap forward passes andGradScalerto scale gradients. - Reduces GPU memory use and can improve training speed.
- Requires NVIDIA GPUs with CUDA support.