What is Multi-GPU training in PyTorch?

PyTorchml~5 mins

Multi-GPU training in PyTorch

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Introduction

Using multiple GPUs helps train machine learning models faster by sharing the work. It lets you handle bigger data and models efficiently.

When training a large neural network that takes too long on one GPU.

When you want to speed up training by splitting data across GPUs.

When your model or batch size is too big for a single GPU's memory.

When experimenting with bigger datasets that require more computing power.

Syntax

PyTorch

model = YourModel()
model = torch.nn.DataParallel(model)
model = model.to(device)

for data, target in dataloader:
    data, target = data.to(device), target.to(device)
    optimizer.zero_grad()
    output = model(data)
    loss = loss_fn(output, target)
    loss.backward()
    optimizer.step()

torch.nn.DataParallel wraps your model to run on multiple GPUs automatically.

Make sure your input data and model are moved to the correct device (usually 'cuda').

Examples

This wraps the model to use all available GPUs and moves it to GPU memory.

PyTorch

model = MyModel()
model = torch.nn.DataParallel(model)
model = model.cuda()

Move each batch of data to GPU before feeding it to the model.

PyTorch

for inputs, labels in dataloader:
    inputs, labels = inputs.cuda(), labels.cuda()
    outputs = model(inputs)

Sample Model

This code trains a simple model on dummy data using multiple GPUs if available. It prints the average loss per epoch.

PyTorch

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

# Simple model
class SimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Linear(10, 2)
    def forward(self, x):
        return self.fc(x)

# Create dummy data
x = torch.randn(100, 10)
y = torch.randint(0, 2, (100,))

# Dataset and loader
dataset = TensorDataset(x, y)
dataloader = DataLoader(dataset, batch_size=20)

# Device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Model
model = SimpleNet()
if torch.cuda.device_count() > 1:
    model = nn.DataParallel(model)
model = model.to(device)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

# Training loop
model.train()
for epoch in range(2):
    total_loss = 0
    for inputs, labels in dataloader:
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
    print(f'Epoch {epoch+1}, Loss: {total_loss/len(dataloader):.4f}')

OutputSuccess

Important Notes

DataParallel splits input batches automatically across GPUs and gathers results.

For better performance on multiple GPUs, consider using torch.nn.parallel.DistributedDataParallel in real projects.

Always check if GPUs are available with torch.cuda.is_available() before using them.

Summary

Multi-GPU training speeds up model training by sharing work across GPUs.

Use torch.nn.DataParallel to easily enable multi-GPU support.

Move both model and data to GPUs to ensure proper training.