Given the following PyTorch DataLoader setup, how many batches will it produce?
from torch.utils.data import DataLoader, TensorDataset import torch data = torch.arange(10) dataset = TensorDataset(data) dataloader = DataLoader(dataset, batch_size=3, shuffle=False) batches = list(dataloader) print(len(batches))
Think about how many full batches of size 3 fit into 10 items, and what happens to the leftover items.
With 10 items and batch size 3, you get 3 full batches (3*3=9) plus 1 batch with the leftover 1 item, totaling 4 batches.
You want your model to see data in a different order every time you train. Which DataLoader parameter should you set?
Think about which option changes the order of data samples.
Setting shuffle=True makes DataLoader shuffle the dataset at the start of each epoch.
Consider a dataset with 10 samples and batch_size=4. What happens if drop_last=True?
Think about what happens to incomplete batches when drop_last is True.
drop_last=True causes DataLoader to drop the last batch if it has fewer samples than batch_size.
Examine the code below. Why does it raise a RuntimeError about workers?
from torch.utils.data import DataLoader, TensorDataset import torch data = torch.arange(5) dataset = TensorDataset(data) dataloader = DataLoader(dataset, batch_size=2, num_workers=2) for batch in dataloader: print(batch)
Consider platform-specific multiprocessing rules in Python.
On Windows, using num_workers > 0 requires the DataLoader code to be inside a 'if __name__ == "__main__"' guard to avoid RuntimeError.
Why would you set num_workers > 0 in a DataLoader? Choose the best explanation.
Think about how multiple workers affect data loading speed.
Multiple workers load data in parallel, which can speed up training by reducing data loading bottlenecks.