0
0
PytorchHow-ToBeginner · 3 min read

How to Use SubsetRandomSampler in PyTorch for Custom Sampling

Use SubsetRandomSampler in PyTorch to randomly sample elements from a specified list of indices. Pass it to the DataLoader via the sampler argument to control which subset of data is used during training or evaluation.
📐

Syntax

The SubsetRandomSampler takes a list or tensor of indices and samples elements randomly from these indices without replacement.

It is typically used with DataLoader by passing it as the sampler argument.

  • indices: List or tensor of dataset indices to sample from.
python
torch.utils.data.SubsetRandomSampler(indices)
💻

Example

This example shows how to use SubsetRandomSampler to randomly sample 5 elements from a dataset of 10 elements.

The sampler is passed to the DataLoader to load only the specified subset randomly.

python
import torch
from torch.utils.data import DataLoader, TensorDataset, SubsetRandomSampler

# Create a simple dataset of 10 numbers
data = torch.arange(10)
dataset = TensorDataset(data)

# Define indices to sample from (subset of dataset)
subset_indices = [1, 3, 5, 7, 9]

# Create SubsetRandomSampler with these indices
sampler = SubsetRandomSampler(subset_indices)

# Create DataLoader with sampler
loader = DataLoader(dataset, batch_size=2, sampler=sampler)

# Iterate and print batches
for batch in loader:
    print(batch[0].tolist())
Output
[3, 7] [1, 9] [5]
⚠️

Common Pitfalls

  • Do not use shuffle=True in DataLoader when using SubsetRandomSampler, as they conflict.
  • Ensure the indices passed to SubsetRandomSampler are valid and within dataset range.
  • Remember that SubsetRandomSampler samples without replacement, so each index appears once per epoch.
python
from torch.utils.data import DataLoader, SubsetRandomSampler

# Wrong: Using shuffle=True with sampler causes error
# loader = DataLoader(dataset, batch_size=2, sampler=sampler, shuffle=True)  # This will raise an error

# Right: Use either sampler or shuffle, not both
loader = DataLoader(dataset, batch_size=2, sampler=sampler)
📊

Quick Reference

  • Purpose: Randomly sample from a subset of dataset indices.
  • Use with: Pass to DataLoader as sampler.
  • Behavior: Samples without replacement each epoch.
  • Do not combine: shuffle=True with sampler.

Key Takeaways

SubsetRandomSampler samples randomly from a specified list of dataset indices without replacement.
Pass SubsetRandomSampler to DataLoader's sampler argument to control data loading from subsets.
Do not use shuffle=True with sampler to avoid conflicts and errors.
Ensure indices are valid and within the dataset range to prevent runtime errors.
SubsetRandomSampler is useful for creating custom random subsets for training or validation.