0
0
PyTorchml~5 mins

Multi-GPU training in PyTorch - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What is Multi-GPU training in PyTorch?
Multi-GPU training means using more than one graphics card to train a model faster by splitting the work across GPUs.
Click to reveal answer
beginner
What PyTorch class helps to easily use multiple GPUs for training?
The <code>torch.nn.DataParallel</code> class wraps a model to run it on multiple GPUs by splitting input data automatically.
Click to reveal answer
beginner
Why do we need to move the model and data to GPUs in PyTorch multi-GPU training?
Because GPUs do the heavy math work, the model and data must be on GPUs to speed up training. PyTorch needs explicit commands to move them.
Click to reveal answer
intermediate
What is a common challenge when using multiple GPUs for training?
Synchronizing gradients and combining results from all GPUs can be tricky, but PyTorch handles this automatically with DataParallel.
Click to reveal answer
intermediate
How does DataParallel split the input data across GPUs?
It splits the input batch into smaller chunks, sends each chunk to a different GPU, runs the model on each chunk, then combines the outputs.
Click to reveal answer
Which PyTorch class is commonly used for simple multi-GPU training?
Atorch.optim.Adam
Btorch.nn.DataParallel
Ctorch.utils.data.DataLoader
Dtorch.nn.Sequential
Before training on GPUs, what must you do with your model in PyTorch?
ACall <code>model.to('cuda')</code> to move it to GPU
BCall <code>model.train()</code> only
CCall <code>model.eval()</code>
DNothing, PyTorch moves it automatically
How does DataParallel handle the outputs from multiple GPUs?
AIt discards outputs from all but one GPU
BIt leaves outputs on each GPU separately
CIt combines outputs back on the main GPU
DIt sends outputs to CPU automatically
What is a benefit of using multiple GPUs for training?
ALess memory usage on each GPU
BNo need to move data to GPU
CSimpler code without changes
DFaster training by parallelizing work
Which of these is NOT true about DataParallel?
AIt requires manual gradient synchronization
BIt wraps the model for multi-GPU use
CIt combines outputs on the main GPU
DIt automatically splits input batches
Explain how PyTorch's DataParallel helps in multi-GPU training.
Think about how input and output are handled across GPUs.
You got /5 concepts.
    Describe the steps to prepare a PyTorch model and data for multi-GPU training.
    Consider what needs to be on GPU and how to enable multi-GPU.
    You got /4 concepts.