Mixed precision training helps your model learn faster and use less memory by mixing small and normal numbers smartly.
0
0
Mixed precision training (AMP) in PyTorch
Introduction
Training deep learning models on GPUs with limited memory.
Speeding up training without losing model accuracy.
Running large models that otherwise don't fit in GPU memory.
Reducing electricity and hardware costs during training.
Syntax
PyTorch
import torch from torch.cuda.amp import autocast, GradScaler scaler = GradScaler() for data, target in dataloader: optimizer.zero_grad() with autocast(): output = model(data) loss = loss_fn(output, target) scaler.scale(loss).backward() scaler.step(optimizer) scaler.update()
autocast() runs operations in mixed precision automatically.
GradScaler helps keep training stable by scaling gradients.
Examples
This runs the forward pass and loss calculation in mixed precision.
PyTorch
with autocast():
output = model(data)
loss = loss_fn(output, target)This scales the loss, does backpropagation, updates weights, and adjusts scaling.
PyTorch
scaler.scale(loss).backward() scaler.step(optimizer) scaler.update()
Sample Model
This code trains a simple model for 3 epochs using mixed precision. It prints the loss each epoch.
PyTorch
import torch import torch.nn as nn import torch.optim as optim from torch.cuda.amp import autocast, GradScaler # Simple model class SimpleModel(nn.Module): def __init__(self): super().__init__() self.linear = nn.Linear(10, 1) def forward(self, x): return self.linear(x) # Data x = torch.randn(20, 10).cuda() y = torch.randn(20, 1).cuda() model = SimpleModel().cuda() optimizer = optim.SGD(model.parameters(), lr=0.1) loss_fn = nn.MSELoss() scaler = GradScaler() model.train() for epoch in range(3): optimizer.zero_grad() with autocast(): output = model(x) loss = loss_fn(output, y) scaler.scale(loss).backward() scaler.step(optimizer) scaler.update() print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")
OutputSuccess
Important Notes
Mixed precision works best on GPUs with Tensor Cores (like NVIDIA RTX series).
Always use GradScaler to avoid problems with very small gradients.
Check your model's accuracy to ensure mixed precision does not reduce it.
Summary
Mixed precision training speeds up learning and saves memory by using smaller numbers where possible.
Use autocast() for automatic mixed precision during forward pass.
Use GradScaler to keep training stable when using mixed precision.