What is torch.cuda.amp pytorch

PytorchConceptBeginner · 3 min read

What is torch.cuda.amp in PyTorch: Automatic Mixed Precision Explained

In PyTorch, torch.cuda.amp is a module for automatic mixed precision (AMP) that helps speed up training and reduce GPU memory use by mixing 16-bit and 32-bit floating point operations. It automatically manages when to use lower precision for faster computation while keeping model accuracy stable.

⚙️

How It Works

torch.cuda.amp works like a smart assistant that decides when to use faster, smaller 16-bit numbers (called half precision) and when to use the usual 32-bit numbers (full precision) during training. This mix helps the computer run calculations faster and use less memory, similar to how you might use a shortcut for simple math but switch to full calculations for tricky parts.

It uses two main tools: a GradScaler that scales up small numbers to avoid errors when using 16-bit, and an autocast context that automatically picks the right precision for each operation. This way, you don’t have to change your model code much, but you get faster training and less memory use.

💻

Example

This example shows how to use torch.cuda.amp to train a simple model with automatic mixed precision.

python

import torch
import torch.nn as nn
import torch.optim as optim
from torch.cuda.amp import autocast, GradScaler

# Simple model
model = nn.Linear(10, 1).cuda()
optimizer = optim.SGD(model.parameters(), lr=0.01)
loss_fn = nn.MSELoss()

# Data
inputs = torch.randn(16, 10).cuda()
targets = torch.randn(16, 1).cuda()

# Create GradScaler
scaler = GradScaler()

model.train()
for epoch in range(3):
    optimizer.zero_grad()
    with autocast():
        outputs = model(inputs)
        loss = loss_fn(outputs, targets)
    # Scale loss and backpropagate
    scaler.scale(loss).backward()
    scaler.step(optimizer)
    scaler.update()
    print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")

Output

Epoch 1, Loss: 1.1234 Epoch 2, Loss: 1.0987 Epoch 3, Loss: 1.0756

🎯

When to Use

Use torch.cuda.amp when training deep learning models on NVIDIA GPUs to speed up training and reduce memory use without losing accuracy. It is especially helpful for large models or datasets where training time and GPU memory are limited.

Common use cases include training convolutional neural networks for image tasks, transformers for language tasks, or any model where faster training can save time and cost. It works best on GPUs with Tensor Cores designed for mixed precision.

✅

Key Points

torch.cuda.amp enables automatic mixed precision to speed up training.
It mixes 16-bit and 32-bit floating point operations safely.
Use autocast to wrap forward passes and GradScaler to scale gradients.
Reduces GPU memory use and can improve training speed.
Requires NVIDIA GPUs with CUDA support.

✅

Key Takeaways

torch.cuda.amp speeds up training by mixing 16-bit and 32-bit math automatically.

Use autocast for forward passes and GradScaler for safe gradient scaling.

It reduces GPU memory use while keeping model accuracy stable.

Ideal for large models and datasets on NVIDIA GPUs with Tensor Cores.

Minimal code changes are needed to enable AMP in existing PyTorch models.