0
0
PyTorchml~12 mins

__getitem__ and __len__ in PyTorch - Model Pipeline Trace

Choose your learning style9 modes available
Model Pipeline - __getitem__ and __len__

This pipeline shows how a PyTorch Dataset uses __getitem__ and __len__ to provide data samples for training a model. The dataset gives data one sample at a time, and the model learns from these samples.

Data Flow - 3 Stages
1Raw Data
1000 rows x 5 columnsOriginal dataset with features and labels1000 rows x 5 columns
[[0.5, 1.2, 3.3, 0.7, 1], [1.1, 0.4, 2.2, 1.5, 0], ...]
2__len__ method
1000 rows x 5 columnsReturns total number of samples in datasetInteger: 1000
len(dataset) -> 1000
3__getitem__ method
Index integer (e.g., 0)Returns one data sample (features and label) at given indexTuple (features: 4 values, label: 1 value)
dataset[0] -> (tensor([0.5, 1.2, 3.3, 0.7]), tensor(1))
Training Trace - Epoch by Epoch
Loss
1.0 | *
0.9 |  *
0.8 |   *
0.7 |    *
0.6 |     *
0.5 |      *
0.4 |       *
0.3 |        *
    +----------------
     1 2 3 4 5 Epochs
EpochLoss ↓Accuracy ↑Observation
10.850.60Model starts learning with moderate loss and accuracy.
20.650.72Loss decreases and accuracy improves as model learns.
30.500.80Model continues to improve with more training.
40.400.85Loss decreases further, accuracy rises.
50.350.88Training converges with good accuracy.
Prediction Trace - 5 Layers
Layer 1: __getitem__ call
Layer 2: Model input layer
Layer 3: Hidden layers
Layer 4: Output layer
Layer 5: Prediction
Model Quiz - 3 Questions
Test your understanding
What does the __len__ method return in a PyTorch Dataset?
AThe number of features in each sample
BThe total number of samples in the dataset
CThe label of the first sample
DThe batch size used in training
Key Insight
The __len__ and __getitem__ methods let PyTorch datasets behave like lists. This helps the training loop get data samples easily and efficiently, enabling smooth model training.