These two methods help your data class tell PyTorch how many items it has and how to get each item. This makes it easy to use your data in training.
0
0
__getitem__ and __len__ in PyTorch
Introduction
When you create a custom dataset to feed data into a PyTorch model.
When you want to load images or text samples one by one during training.
When your data is not in a ready-made format and you need to control how it is accessed.
When you want to use PyTorch's DataLoader to automatically handle batching and shuffling.
Syntax
PyTorch
class YourDataset(torch.utils.data.Dataset): def __len__(self): # return number of items return number_of_items def __getitem__(self, index): # return one data item at position index return data_item
__len__ tells how many items are in your dataset.
__getitem__ returns one item when given an index number.
Examples
This dataset has 100 items.
PyTorch
def __len__(self): return 100
Returns the data item at position
idx.PyTorch
def __getitem__(self, idx): return self.data[idx]
Returns an image and its label for training.
PyTorch
def __getitem__(self, idx): image = load_image(self.paths[idx]) label = self.labels[idx] return image, label
Sample Model
This code creates a dataset of numbers 0 to 9. The __getitem__ returns double the number. The DataLoader loads data in batches of 3 and prints each batch.
PyTorch
import torch from torch.utils.data import Dataset, DataLoader class SimpleDataset(Dataset): def __init__(self): self.data = [i for i in range(10)] # numbers 0 to 9 def __len__(self): return len(self.data) def __getitem__(self, idx): return self.data[idx] * 2 # return double the number # Create dataset and dataloader dataset = SimpleDataset() dataloader = DataLoader(dataset, batch_size=3, shuffle=False) for batch in dataloader: print(batch)
OutputSuccess
Important Notes
Always make sure __len__ matches the actual number of items your dataset can provide.
__getitem__ should handle any index from 0 to len(dataset)-1.
These methods let PyTorch's DataLoader work smoothly for batching and shuffling.
Summary
__len__ tells how many items are in your dataset.
__getitem__ returns one item by index.
Together, they let PyTorch load your data easily for training.