What is Mixed precision training (AMP) in PyTorch?

PyTorchml~5 mins

Mixed precision training (AMP) in PyTorch

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Introduction

Mixed precision training helps your model learn faster and use less memory by mixing small and normal numbers smartly.

Training deep learning models on GPUs with limited memory.

Speeding up training without losing model accuracy.

Running large models that otherwise don't fit in GPU memory.

Reducing electricity and hardware costs during training.

Syntax

PyTorch

import torch
from torch.cuda.amp import autocast, GradScaler

scaler = GradScaler()

for data, target in dataloader:
    optimizer.zero_grad()
    with autocast():
        output = model(data)
        loss = loss_fn(output, target)
    scaler.scale(loss).backward()
    scaler.step(optimizer)
    scaler.update()

autocast() runs operations in mixed precision automatically.

GradScaler helps keep training stable by scaling gradients.

Examples

This runs the forward pass and loss calculation in mixed precision.

PyTorch

with autocast():
    output = model(data)
    loss = loss_fn(output, target)

This scales the loss, does backpropagation, updates weights, and adjusts scaling.

PyTorch

scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()

Sample Model

This code trains a simple model for 3 epochs using mixed precision. It prints the loss each epoch.

PyTorch

import torch
import torch.nn as nn
import torch.optim as optim
from torch.cuda.amp import autocast, GradScaler

# Simple model
class SimpleModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(10, 1)
    def forward(self, x):
        return self.linear(x)

# Data
x = torch.randn(20, 10).cuda()
y = torch.randn(20, 1).cuda()

model = SimpleModel().cuda()
optimizer = optim.SGD(model.parameters(), lr=0.1)
loss_fn = nn.MSELoss()
scaler = GradScaler()

model.train()
for epoch in range(3):
    optimizer.zero_grad()
    with autocast():
        output = model(x)
        loss = loss_fn(output, y)
    scaler.scale(loss).backward()
    scaler.step(optimizer)
    scaler.update()
    print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")

OutputSuccess

Important Notes

Mixed precision works best on GPUs with Tensor Cores (like NVIDIA RTX series).

Always use GradScaler to avoid problems with very small gradients.

Check your model's accuracy to ensure mixed precision does not reduce it.

Summary

Mixed precision training speeds up learning and saves memory by using smaller numbers where possible.

Use autocast() for automatic mixed precision during forward pass.

Use GradScaler to keep training stable when using mixed precision.