0
0
PyTorchml~10 mins

DistributedDataParallel in PyTorch - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to initialize the distributed process group.

PyTorch
import torch.distributed as dist

dist.init_process_group(backend=[1], init_method='env://')
Drag options to blanks, or click blank then click option'
A'gloo'
B'nccl'
C'mpi'
D'cuda'
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'cuda' as backend which is invalid.
Confusing 'nccl' with 'gloo' for CPU-only setups.
2fill in blank
medium

Complete the code to wrap the model with DistributedDataParallel.

PyTorch
import torch.nn as nn
import torch.distributed as dist
from torch.nn.parallel import DistributedDataParallel as DDP

model = nn.Linear(10, 5).to(device)
model = [1](model)
Drag options to blanks, or click blank then click option'
Ann.DataParallel
Bdist.DataParallel
CDDP
Dnn.parallel.DistributedDataParallel
Attempts:
3 left
💡 Hint
Common Mistakes
Using nn.DataParallel instead of DistributedDataParallel.
Not moving the model to device before wrapping.
3fill in blank
hard

Fix the error in the code to correctly set the device for the process.

PyTorch
import torch
import torch.distributed as dist
import os

local_rank = int(os.environ['LOCAL_RANK'])
torch.cuda.set_device([1])
Drag options to blanks, or click blank then click option'
Atorch.device('cuda')
Blocal_rank
Cdist.get_rank()
D0
Attempts:
3 left
💡 Hint
Common Mistakes
Setting device to 0 for all processes causing GPU conflicts.
Using dist.get_rank() which is global rank, not local.
4fill in blank
hard

Fill both blanks to create a DistributedSampler and DataLoader for distributed training.

PyTorch
from torch.utils.data import DataLoader, [1]

sampler = [2](dataset, num_replicas=dist.get_world_size(), rank=dist.get_rank())
dataloader = DataLoader(dataset, batch_size=32, sampler=sampler)
Drag options to blanks, or click blank then click option'
ADistributedSampler
BRandomSampler
DSequentialSampler
Attempts:
3 left
💡 Hint
Common Mistakes
Using RandomSampler which does not split data across processes.
Not using a sampler causing data duplication.
5fill in blank
hard

Fill all three blanks to correctly perform a training step with DistributedDataParallel.

PyTorch
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, targets)
loss.[1]()
optimizer.[2]()
if dist.get_rank() == 0:
    print('Loss:', loss.[3]().item())
Drag options to blanks, or click blank then click option'
Abackward
Bstep
Cgrad
Ddetach
Attempts:
3 left
💡 Hint
Common Mistakes
Calling optimizer.step() before loss.backward().
Printing loss.item() without detach causing gradient tracking.