Using multiple GPUs helps train machine learning models faster by sharing the work. It lets you handle bigger data and models efficiently.
Multi-GPU training in PyTorch
model = YourModel() model = torch.nn.DataParallel(model) model = model.to(device) for data, target in dataloader: data, target = data.to(device), target.to(device) optimizer.zero_grad() output = model(data) loss = loss_fn(output, target) loss.backward() optimizer.step()
torch.nn.DataParallel wraps your model to run on multiple GPUs automatically.
Make sure your input data and model are moved to the correct device (usually 'cuda').
model = MyModel() model = torch.nn.DataParallel(model) model = model.cuda()
for inputs, labels in dataloader: inputs, labels = inputs.cuda(), labels.cuda() outputs = model(inputs)
This code trains a simple model on dummy data using multiple GPUs if available. It prints the average loss per epoch.
import torch import torch.nn as nn import torch.optim as optim from torch.utils.data import DataLoader, TensorDataset # Simple model class SimpleNet(nn.Module): def __init__(self): super().__init__() self.fc = nn.Linear(10, 2) def forward(self, x): return self.fc(x) # Create dummy data x = torch.randn(100, 10) y = torch.randint(0, 2, (100,)) # Dataset and loader dataset = TensorDataset(x, y) dataloader = DataLoader(dataset, batch_size=20) # Device device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') # Model model = SimpleNet() if torch.cuda.device_count() > 1: model = nn.DataParallel(model) model = model.to(device) # Loss and optimizer criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(), lr=0.1) # Training loop model.train() for epoch in range(2): total_loss = 0 for inputs, labels in dataloader: inputs, labels = inputs.to(device), labels.to(device) optimizer.zero_grad() outputs = model(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() total_loss += loss.item() print(f'Epoch {epoch+1}, Loss: {total_loss/len(dataloader):.4f}')
DataParallel splits input batches automatically across GPUs and gathers results.
For better performance on multiple GPUs, consider using torch.nn.parallel.DistributedDataParallel in real projects.
Always check if GPUs are available with torch.cuda.is_available() before using them.
Multi-GPU training speeds up model training by sharing work across GPUs.
Use torch.nn.DataParallel to easily enable multi-GPU support.
Move both model and data to GPUs to ensure proper training.