How to Use DataParallel in PyTorch for Multi-GPU Training
Use
torch.nn.DataParallel to wrap your model and enable multi-GPU training by distributing input batches across GPUs automatically. Wrap your model with DataParallel(model) and move it to a CUDA device to start parallel training.Syntax
The basic syntax to use DataParallel is to wrap your existing model like this:
model = YourModel(): create your model instance.model = torch.nn.DataParallel(model): wrap the model to enable multi-GPU usage.model.to('cuda'): move the model to GPU.
During training, just use model(input) as usual. DataParallel splits the input batch across GPUs and gathers outputs automatically.
python
model = YourModel() model = torch.nn.DataParallel(model) model.to('cuda') output = model(input)
Example
This example shows how to use DataParallel to train a simple neural network on multiple GPUs. It creates random input data, wraps the model, and runs a forward pass.
python
import torch import torch.nn as nn class SimpleModel(nn.Module): def __init__(self): super().__init__() self.linear = nn.Linear(10, 5) def forward(self, x): return self.linear(x) # Create model and wrap with DataParallel model = SimpleModel() model = nn.DataParallel(model) model.to('cuda') # Create dummy input batch of size 16 input_tensor = torch.randn(16, 10).to('cuda') # Forward pass output = model(input_tensor) print('Output shape:', output.shape)
Output
Output shape: torch.Size([16, 5])
Common Pitfalls
- Not moving input to GPU: Inputs must be on the same device as the model (usually CUDA) before passing to
DataParallel. - Accessing model attributes: After wrapping, access the original model with
model.moduleto save or modify. - Single GPU fallback:
DataParallelworks with one GPU but adds overhead; consider using it only if multiple GPUs are available.
python
import torch import torch.nn as nn model = nn.Linear(10, 5) model = nn.DataParallel(model) # Wrong: input on CPU input_cpu = torch.randn(16, 10) # output = model(input_cpu) # This will error # Right: move input to CUDA input_cuda = input_cpu.to('cuda') model.to('cuda') output = model(input_cuda) # Access original model original_model = model.module
Quick Reference
- Wrap your model:
model = nn.DataParallel(model) - Move model and inputs to CUDA:
model.to('cuda'),input = input.to('cuda') - Use
model(input)as usual - Access original model with
model.module - Works best when batch size is large enough to split across GPUs
Key Takeaways
Wrap your model with torch.nn.DataParallel to enable multi-GPU training easily.
Always move both model and input tensors to CUDA devices before using DataParallel.
Access the original model via model.module after wrapping for saving or modifications.
DataParallel automatically splits input batches and gathers outputs across GPUs.
Use DataParallel only if you have multiple GPUs and a large enough batch size.