beginner

What is Multi-GPU training in PyTorch?

Multi-GPU training means using more than one graphics card to train a model faster by splitting the work across GPUs.

Click to reveal answer

beginner

What PyTorch class helps to easily use multiple GPUs for training?

The <code>torch.nn.DataParallel</code> class wraps a model to run it on multiple GPUs by splitting input data automatically.

Click to reveal answer

beginner

Why do we need to move the model and data to GPUs in PyTorch multi-GPU training?

Because GPUs do the heavy math work, the model and data must be on GPUs to speed up training. PyTorch needs explicit commands to move them.

Click to reveal answer

intermediate

What is a common challenge when using multiple GPUs for training?

Synchronizing gradients and combining results from all GPUs can be tricky, but PyTorch handles this automatically with DataParallel.

Click to reveal answer

intermediate

How does DataParallel split the input data across GPUs?

It splits the input batch into smaller chunks, sends each chunk to a different GPU, runs the model on each chunk, then combines the outputs.

Click to reveal answer

Which PyTorch class is commonly used for simple multi-GPU training?

Atorch.optim.Adam

Btorch.nn.DataParallel

Ctorch.utils.data.DataLoader

Dtorch.nn.Sequential

Before training on GPUs, what must you do with your model in PyTorch?

ACall <code>model.to('cuda')</code> to move it to GPU

BCall <code>model.train()</code> only

CCall <code>model.eval()</code>

DNothing, PyTorch moves it automatically

How does DataParallel handle the outputs from multiple GPUs?

AIt discards outputs from all but one GPU

BIt leaves outputs on each GPU separately

CIt combines outputs back on the main GPU

DIt sends outputs to CPU automatically

What is a benefit of using multiple GPUs for training?

ALess memory usage on each GPU

BNo need to move data to GPU

CSimpler code without changes

DFaster training by parallelizing work

Which of these is NOT true about DataParallel?

AIt requires manual gradient synchronization

BIt wraps the model for multi-GPU use

CIt combines outputs on the main GPU

DIt automatically splits input batches

Explain how PyTorch's DataParallel helps in multi-GPU training.

Describe the steps to prepare a PyTorch model and data for multi-GPU training.