0
0
PyTorchml~10 mins

Why distributed training handles large models in PyTorch - Test Your Understanding

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to import the PyTorch distributed package.

PyTorch
import torch
import torch.[1] as dist
Drag options to blanks, or click blank then click option'
Adistributed
Bdistribute
Cdistutils
Ddistlib
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'distribute' instead of 'distributed'
Confusing with unrelated modules like 'distutils'
2fill in blank
medium

Complete the code to initialize the distributed process group.

PyTorch
dist.init_process_group(backend=[1], init_method='env://')
Drag options to blanks, or click blank then click option'
A'nccl'
B'mpi'
C'gloo'
D'cuda'
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'gloo' which is CPU-based
Using 'cuda' which is not a valid backend string
3fill in blank
hard

Fix the error in the code to wrap the model for distributed training.

PyTorch
model = torch.nn.[1](model)
Drag options to blanks, or click blank then click option'
AParallelDataDistributed
BDataParallel
CDistributedDataParallel
DDistributedParallelData
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'DataParallel' which is not for distributed multi-node training
Misspelling the class name
4fill in blank
hard

Fill both blanks to create a distributed sampler and data loader for training.

PyTorch
train_sampler = torch.utils.data.[1](dataset, num_replicas=dist.get_world_size(), rank=dist.get_rank())
train_loader = torch.utils.data.DataLoader(dataset, batch_size=32, sampler=[2])
Drag options to blanks, or click blank then click option'
ADistributedSampler
BRandomSampler
Ctrain_sampler
DSequentialSampler
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'RandomSampler' which does not split data across processes
Passing wrong sampler variable to DataLoader
5fill in blank
hard

Fill all three blanks to perform a training step with distributed training.

PyTorch
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.[1]()
optimizer.[2]()
dist.barrier()  # [3] synchronization
Drag options to blanks, or click blank then click option'
Abackward
Bstep
Censures all processes wait for each other
Dzero_grad
Attempts:
3 left
💡 Hint
Common Mistakes
Calling optimizer.zero_grad() instead of optimizer.step()
Missing loss.backward() call
Not synchronizing processes with dist.barrier()