0
0
PyTorchml~5 mins

__getitem__ and __len__ in PyTorch

Choose your learning style9 modes available
Introduction

These two methods help your data class tell PyTorch how many items it has and how to get each item. This makes it easy to use your data in training.

When you create a custom dataset to feed data into a PyTorch model.
When you want to load images or text samples one by one during training.
When your data is not in a ready-made format and you need to control how it is accessed.
When you want to use PyTorch's DataLoader to automatically handle batching and shuffling.
Syntax
PyTorch
class YourDataset(torch.utils.data.Dataset):
    def __len__(self):
        # return number of items
        return number_of_items

    def __getitem__(self, index):
        # return one data item at position index
        return data_item

__len__ tells how many items are in your dataset.

__getitem__ returns one item when given an index number.

Examples
This dataset has 100 items.
PyTorch
def __len__(self):
    return 100
Returns the data item at position idx.
PyTorch
def __getitem__(self, idx):
    return self.data[idx]
Returns an image and its label for training.
PyTorch
def __getitem__(self, idx):
    image = load_image(self.paths[idx])
    label = self.labels[idx]
    return image, label
Sample Model

This code creates a dataset of numbers 0 to 9. The __getitem__ returns double the number. The DataLoader loads data in batches of 3 and prints each batch.

PyTorch
import torch
from torch.utils.data import Dataset, DataLoader

class SimpleDataset(Dataset):
    def __init__(self):
        self.data = [i for i in range(10)]  # numbers 0 to 9

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        return self.data[idx] * 2  # return double the number

# Create dataset and dataloader
dataset = SimpleDataset()
dataloader = DataLoader(dataset, batch_size=3, shuffle=False)

for batch in dataloader:
    print(batch)
OutputSuccess
Important Notes

Always make sure __len__ matches the actual number of items your dataset can provide.

__getitem__ should handle any index from 0 to len(dataset)-1.

These methods let PyTorch's DataLoader work smoothly for batching and shuffling.

Summary

__len__ tells how many items are in your dataset.

__getitem__ returns one item by index.

Together, they let PyTorch load your data easily for training.