Recall & Review
beginner
What is DistributedDataParallel in PyTorch?
DistributedDataParallel (DDP) is a PyTorch module that helps train models on multiple GPUs or machines by splitting data and synchronizing gradients efficiently.
Click to reveal answer
intermediate
Why use DistributedDataParallel instead of DataParallel?
DistributedDataParallel is faster and more scalable because it uses multiple processes and synchronizes gradients in parallel, while DataParallel uses a single process and can be slower.
Click to reveal answer
intermediate
How does DistributedDataParallel synchronize model updates?
DDP synchronizes gradients by averaging them across all processes after each backward pass, ensuring all model copies stay consistent during training.
Click to reveal answer
beginner
What is the role of
torch.distributed.init_process_group() in DDP?It initializes the communication between processes, setting up the environment so that DistributedDataParallel can coordinate training across GPUs or machines.
Click to reveal answer
beginner
What must you do with your dataset when using DistributedDataParallel?
You should use a
DistributedSampler to split the dataset so each process gets a unique subset, avoiding overlap and ensuring efficient training.Click to reveal answer
What does DistributedDataParallel primarily help with?
✗ Incorrect
DistributedDataParallel is designed to train models efficiently across multiple GPUs or machines by distributing data and synchronizing gradients.
Which function initializes communication between processes in DDP?
✗ Incorrect
torch.distributed.init_process_group() sets up the communication needed for DistributedDataParallel to work.
What is the purpose of DistributedSampler in DDP?
✗ Incorrect
DistributedSampler ensures each process gets a unique subset of data, avoiding overlap during distributed training.
How does DDP synchronize model updates?
✗ Incorrect
DDP averages gradients from all processes to keep model copies consistent.
Compared to DataParallel, DistributedDataParallel is:
✗ Incorrect
DistributedDataParallel uses multiple processes and parallel communication, making it faster and more scalable than DataParallel.
Explain how DistributedDataParallel works to train a model on multiple GPUs.
Think about how data and model updates are shared across GPUs.
You got /4 concepts.
Describe the steps needed to prepare your PyTorch training code to use DistributedDataParallel.
Focus on setup and data handling for distributed training.
You got /4 concepts.