0
0
PyTorchml~5 mins

DataParallel basics in PyTorch

Choose your learning style9 modes available
Introduction

DataParallel helps your program use multiple GPUs to train a model faster by splitting the work.

You want to train a neural network faster using more than one GPU.
Your model fits in one GPU but you want to use extra GPUs to speed up training.
You have a desktop or server with multiple GPUs and want to use them easily.
You want to keep your code simple while using multiple GPUs.
Syntax
PyTorch
model = torch.nn.DataParallel(model)
output = model(input)

Wrap your model with DataParallel before training.

Input data is automatically split across GPUs.

Examples
Wrap a simple linear model to use DataParallel.
PyTorch
import torch
import torch.nn as nn

model = nn.Linear(10, 5)
model = torch.nn.DataParallel(model)
Pass a batch of 16 samples to the parallel model.
PyTorch
output = model(torch.randn(16, 10))
Sample Model

This code creates a simple model, wraps it with DataParallel if multiple GPUs are available, and runs one training step with dummy data.

PyTorch
import torch
import torch.nn as nn
import torch.optim as optim

# Simple model
class SimpleModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(10, 2)
    def forward(self, x):
        return self.linear(x)

# Check if multiple GPUs are available
if torch.cuda.device_count() > 1:
    print(f"Using {torch.cuda.device_count()} GPUs")
else:
    print("Using single GPU or CPU")

# Create model and move to GPUs
model = SimpleModel()
if torch.cuda.is_available():
    model = model.cuda()
    if torch.cuda.device_count() > 1:
        model = nn.DataParallel(model)

# Dummy data
inputs = torch.randn(32, 10)
labels = torch.randint(0, 2, (32,))
if torch.cuda.is_available():
    inputs = inputs.cuda()
    labels = labels.cuda()

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Forward pass
outputs = model(inputs)
loss = criterion(outputs, labels)

# Backward and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()

print(f"Loss: {loss.item():.4f}")
OutputSuccess
Important Notes

DataParallel splits input batches automatically across GPUs.

Model outputs are combined back on the main GPU.

DataParallel is easy but not the fastest multi-GPU method; consider DistributedDataParallel for large projects.

Summary

DataParallel lets you use multiple GPUs easily by wrapping your model.

It splits input data and combines outputs automatically.

Good for beginners to speed up training without complex code changes.