Complete the code to wrap the model for parallel GPU training using DataParallel.
import torch import torch.nn as nn model = nn.Linear(10, 2) model = torch.nn.DataParallel([1])
We wrap the existing model instance with torch.nn.DataParallel to enable parallel GPU training.
Complete the code to move the DataParallel model to GPU device 0.
device = torch.device('cuda:0') model = torch.nn.DataParallel(model) model = model[1](device)
cuda() without parentheses.cpu() which moves model to CPU.The to() method moves the model to the specified device.
Fix the error in the code to correctly get the model's original module after DataParallel wrapping.
if isinstance(model, torch.nn.DataParallel): original_model = model[1] else: original_model = model
model.model which does not exist.When using DataParallel, the original model is accessible via the module attribute.
Fill both blanks to create a DataParallel model and move it to all available GPUs.
model = nn.Linear(20, 5) model = torch.nn.DataParallel(model, device_ids=[1]) model = model[2]('cuda')
cuda without parentheses as a method.We specify device IDs as a list of GPU indices and use to('cuda') to move the model to GPUs.
Fill all three blanks to create a dictionary of losses for each GPU and sum them correctly.
losses = {f'gpu_{i}': output[i].mean() for i in range([1])}
total_loss = sum([3] for [3] in losses[2])
We iterate over the values of the losses dictionary for 2 GPUs and sum the loss values.