0
0
PytorchHow-ToBeginner · 3 min read

How to Use DataParallel in PyTorch for Multi-GPU Training

Use torch.nn.DataParallel to wrap your model and enable multi-GPU training by distributing input batches across GPUs automatically. Wrap your model with DataParallel(model) and move it to a CUDA device to start parallel training.
📐

Syntax

The basic syntax to use DataParallel is to wrap your existing model like this:

  • model = YourModel(): create your model instance.
  • model = torch.nn.DataParallel(model): wrap the model to enable multi-GPU usage.
  • model.to('cuda'): move the model to GPU.

During training, just use model(input) as usual. DataParallel splits the input batch across GPUs and gathers outputs automatically.

python
model = YourModel()
model = torch.nn.DataParallel(model)
model.to('cuda')
output = model(input)
💻

Example

This example shows how to use DataParallel to train a simple neural network on multiple GPUs. It creates random input data, wraps the model, and runs a forward pass.

python
import torch
import torch.nn as nn

class SimpleModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(10, 5)
    def forward(self, x):
        return self.linear(x)

# Create model and wrap with DataParallel
model = SimpleModel()
model = nn.DataParallel(model)
model.to('cuda')

# Create dummy input batch of size 16
input_tensor = torch.randn(16, 10).to('cuda')

# Forward pass
output = model(input_tensor)
print('Output shape:', output.shape)
Output
Output shape: torch.Size([16, 5])
⚠️

Common Pitfalls

  • Not moving input to GPU: Inputs must be on the same device as the model (usually CUDA) before passing to DataParallel.
  • Accessing model attributes: After wrapping, access the original model with model.module to save or modify.
  • Single GPU fallback: DataParallel works with one GPU but adds overhead; consider using it only if multiple GPUs are available.
python
import torch
import torch.nn as nn

model = nn.Linear(10, 5)
model = nn.DataParallel(model)

# Wrong: input on CPU
input_cpu = torch.randn(16, 10)
# output = model(input_cpu)  # This will error

# Right: move input to CUDA
input_cuda = input_cpu.to('cuda')
model.to('cuda')
output = model(input_cuda)

# Access original model
original_model = model.module
📊

Quick Reference

  • Wrap your model: model = nn.DataParallel(model)
  • Move model and inputs to CUDA: model.to('cuda'), input = input.to('cuda')
  • Use model(input) as usual
  • Access original model with model.module
  • Works best when batch size is large enough to split across GPUs

Key Takeaways

Wrap your model with torch.nn.DataParallel to enable multi-GPU training easily.
Always move both model and input tensors to CUDA devices before using DataParallel.
Access the original model via model.module after wrapping for saving or modifications.
DataParallel automatically splits input batches and gathers outputs across GPUs.
Use DataParallel only if you have multiple GPUs and a large enough batch size.