How to Use SubsetRandomSampler in PyTorch for Custom Sampling
Use
SubsetRandomSampler in PyTorch to randomly sample elements from a specified list of indices. Pass it to the DataLoader via the sampler argument to control which subset of data is used during training or evaluation.Syntax
The SubsetRandomSampler takes a list or tensor of indices and samples elements randomly from these indices without replacement.
It is typically used with DataLoader by passing it as the sampler argument.
indices: List or tensor of dataset indices to sample from.
python
torch.utils.data.SubsetRandomSampler(indices)
Example
This example shows how to use SubsetRandomSampler to randomly sample 5 elements from a dataset of 10 elements.
The sampler is passed to the DataLoader to load only the specified subset randomly.
python
import torch from torch.utils.data import DataLoader, TensorDataset, SubsetRandomSampler # Create a simple dataset of 10 numbers data = torch.arange(10) dataset = TensorDataset(data) # Define indices to sample from (subset of dataset) subset_indices = [1, 3, 5, 7, 9] # Create SubsetRandomSampler with these indices sampler = SubsetRandomSampler(subset_indices) # Create DataLoader with sampler loader = DataLoader(dataset, batch_size=2, sampler=sampler) # Iterate and print batches for batch in loader: print(batch[0].tolist())
Output
[3, 7]
[1, 9]
[5]
Common Pitfalls
- Do not use
shuffle=TrueinDataLoaderwhen usingSubsetRandomSampler, as they conflict. - Ensure the indices passed to
SubsetRandomSamplerare valid and within dataset range. - Remember that
SubsetRandomSamplersamples without replacement, so each index appears once per epoch.
python
from torch.utils.data import DataLoader, SubsetRandomSampler # Wrong: Using shuffle=True with sampler causes error # loader = DataLoader(dataset, batch_size=2, sampler=sampler, shuffle=True) # This will raise an error # Right: Use either sampler or shuffle, not both loader = DataLoader(dataset, batch_size=2, sampler=sampler)
Quick Reference
- Purpose: Randomly sample from a subset of dataset indices.
- Use with: Pass to
DataLoaderassampler. - Behavior: Samples without replacement each epoch.
- Do not combine:
shuffle=Truewithsampler.
Key Takeaways
SubsetRandomSampler samples randomly from a specified list of dataset indices without replacement.
Pass SubsetRandomSampler to DataLoader's sampler argument to control data loading from subsets.
Do not use shuffle=True with sampler to avoid conflicts and errors.
Ensure indices are valid and within the dataset range to prevent runtime errors.
SubsetRandomSampler is useful for creating custom random subsets for training or validation.