Which statement best describes data parallelism in machine learning training?
Think about whether the model or the data is split across devices.
Data parallelism means copying the full model to multiple devices and each device trains on a different subset of data at the same time.
Which option correctly explains model parallelism in machine learning?
Consider whether the model or the data is divided across devices.
Model parallelism splits the model itself into parts, each running on different devices to handle large models that don't fit on one device.
What is the output of this command when setting up data parallelism with PyTorch's DataParallel on 2 GPUs?
import torch import torch.nn as nn model = nn.Linear(10, 2) model = nn.DataParallel(model, device_ids=[0,1]) print(model.device_ids)
Check the attribute that stores device IDs in DataParallel.
The device_ids attribute lists the GPUs used for data parallelism in the order specified.
You split a large model across two GPUs using model parallelism, but get a CUDA out-of-memory error on the first GPU. What is the most likely cause?
Think about how model parts are distributed and GPU memory limits.
If one GPU holds too many model layers, it can run out of memory even if the other GPU is free.
Which scenario best justifies using model parallelism over data parallelism?
Consider the main limitation that model parallelism solves.
Model parallelism is used when the model size exceeds one device's memory, requiring splitting the model across devices.