Complete the code to import the PyTorch distributed package.
import torch import torch.[1] as dist
The correct module for distributed training in PyTorch is torch.distributed.
Complete the code to initialize the distributed process group.
dist.init_process_group(backend=[1], init_method='env://')
'nccl' is the recommended backend for distributed training on GPUs in PyTorch.
Fix the error in the code to wrap the model for distributed training.
model = torch.nn.[1](model)The correct wrapper for distributed training in PyTorch is DistributedDataParallel.
Fill both blanks to create a distributed sampler and data loader for training.
train_sampler = torch.utils.data.[1](dataset, num_replicas=dist.get_world_size(), rank=dist.get_rank()) train_loader = torch.utils.data.DataLoader(dataset, batch_size=32, sampler=[2])
Use DistributedSampler to split data across processes and pass it as the sampler to the DataLoader.
Fill all three blanks to perform a training step with distributed training.
optimizer.zero_grad() outputs = model(inputs) loss = criterion(outputs, labels) loss.[1]() optimizer.[2]() dist.barrier() # [3] synchronization
Call loss.backward() to compute gradients, optimizer.step() to update weights, and dist.barrier() to synchronize all processes.