0
0
PyTorchml~5 mins

__getitem__ and __len__ in PyTorch - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What is the purpose of the __getitem__ method in a PyTorch dataset?
The __getitem__ method lets you get one data sample by its index. It returns the input and label for that index, so PyTorch can use it during training or testing.
Click to reveal answer
beginner
Why do we need to define __len__ in a PyTorch dataset?
The __len__ method tells PyTorch how many samples are in the dataset. This helps PyTorch know when to stop an epoch and how to shuffle data.
Click to reveal answer
intermediate
How does __getitem__ relate to the DataLoader in PyTorch?
The DataLoader calls __getitem__ to get each sample when making batches. It uses the index to fetch data one by one or in parallel.
Click to reveal answer
intermediate
What happens if __len__ returns the wrong number in a PyTorch dataset?
If __len__ is wrong, PyTorch might miss data or try to get samples that don't exist, causing errors or incomplete training.
Click to reveal answer
beginner
Show a simple example of a PyTorch dataset class with <code>__getitem__</code> and <code>__len__</code> methods.
```python
import torch
from torch.utils.data import Dataset

class SimpleDataset(Dataset):
    def __init__(self, data, labels):
        self.data = data
        self.labels = labels

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        x = self.data[idx]
        y = self.labels[idx]
        return x, y

# Example usage:
data = torch.tensor([[1,2],[3,4],[5,6]])
labels = torch.tensor([0,1,0])
dataset = SimpleDataset(data, labels)
print(len(dataset))  # Output: 3
print(dataset[1])    # Output: (tensor([3, 4]), tensor(1))
```
Click to reveal answer
What does the __getitem__ method return in a PyTorch dataset?
AThe model's prediction
BA single data sample and its label
CThe batch size
DThe total number of samples
Why is __len__ important in a PyTorch dataset?
AIt trains the model
BIt loads the data from disk
CIt normalizes the data
DIt tells how many samples are in the dataset
If __len__ returns 100, what does that mean?
AThe model has 100 layers
BThe batch size is 100
CThere are 100 samples in the dataset
DThe dataset is empty
What happens if you try to access an index outside the range in __getitem__?
AYou get an error
BYou get a random sample
CPyTorch automatically fixes it
DNothing happens
Which PyTorch class do you usually inherit to create a custom dataset with __getitem__ and __len__?
Atorch.utils.data.Dataset
Btorch.nn.Module
Ctorch.optim.Optimizer
Dtorch.Tensor
Explain in your own words why __getitem__ and __len__ are essential for PyTorch datasets.
Think about how PyTorch gets data samples and knows when to stop.
You got /3 concepts.
    Describe what could go wrong if __len__ returns a smaller or larger number than the actual dataset size.
    Consider what happens when PyTorch tries to access data beyond the dataset.
    You got /3 concepts.