beginner

What is DistributedDataParallel in PyTorch?

DistributedDataParallel (DDP) is a PyTorch module that helps train models on multiple GPUs or machines by splitting data and synchronizing gradients efficiently.

Click to reveal answer

intermediate

Why use DistributedDataParallel instead of DataParallel?

DistributedDataParallel is faster and more scalable because it uses multiple processes and synchronizes gradients in parallel, while DataParallel uses a single process and can be slower.

Click to reveal answer

intermediate

How does DistributedDataParallel synchronize model updates?

DDP synchronizes gradients by averaging them across all processes after each backward pass, ensuring all model copies stay consistent during training.

Click to reveal answer

beginner

What is the role of torch.distributed.init_process_group() in DDP?

It initializes the communication between processes, setting up the environment so that DistributedDataParallel can coordinate training across GPUs or machines.

Click to reveal answer

beginner

What must you do with your dataset when using DistributedDataParallel?

You should use a DistributedSampler to split the dataset so each process gets a unique subset, avoiding overlap and ensuring efficient training.

Click to reveal answer

What does DistributedDataParallel primarily help with?

AVisualizing model predictions

BReducing model size

CTraining models on multiple GPUs or machines efficiently

DImproving data loading speed

Which function initializes communication between processes in DDP?

Atorch.distributed.init_process_group()

Btorch.nn.Module()

Ctorch.optim.SGD()

Dtorch.utils.data.DataLoader()

What is the purpose of DistributedSampler in DDP?

ATo shuffle data randomly

BTo split dataset uniquely across processes

CTo increase batch size

DTo reduce model parameters

How does DDP synchronize model updates?

ABy copying weights from one GPU to another

BBy saving checkpoints

CBy sending data to CPU

DBy averaging gradients across all processes after backward pass

Compared to DataParallel, DistributedDataParallel is:

AFaster and more scalable

BOnly for CPU training

CSlower and less scalable

DUsed for model evaluation only

Explain how DistributedDataParallel works to train a model on multiple GPUs.

Describe the steps needed to prepare your PyTorch training code to use DistributedDataParallel.