0
0
PyTorchml~8 mins

Custom Dataset class in PyTorch - Model Metrics & Evaluation

Choose your learning style9 modes available
Metrics & Evaluation - Custom Dataset class
Which metric matters for Custom Dataset class and WHY

When using a custom dataset class in PyTorch, the main goal is to ensure your data loads correctly and efficiently. While this is not a model metric, the key "metric" is data integrity and loading speed. You want to confirm that your dataset class returns the right data samples and labels without errors or delays. This ensures your model training uses the correct inputs and runs smoothly.

Confusion matrix or equivalent visualization

Since a custom dataset class is about data handling, not predictions, a confusion matrix does not apply here. Instead, you can check your dataset by printing sample outputs and their labels to verify correctness.

Sample output from dataset:
Index: 0
Image shape: (3, 224, 224)
Label: 2

Index: 1
Image shape: (3, 224, 224)
Label: 0

... (and so on)
Tradeoff: Data loading speed vs memory usage

When designing a custom dataset, you often trade off between loading all data into memory (fast access but high memory use) or loading data on demand (low memory but slower). For example, loading all images at once speeds up training but needs more RAM. Loading images one by one saves memory but can slow training if disk access is slow.

What "good" vs "bad" looks like for a custom dataset class

Good: Dataset returns correct data and labels, no crashes, consistent data shapes, and loads data quickly enough to keep training smooth.

Bad: Dataset returns wrong labels, crashes on some indexes, inconsistent data shapes, or is too slow causing training to stall.

Common pitfalls when creating a custom dataset class
  • Not implementing __len__ or __getitem__ correctly, causing errors.
  • Returning data in wrong format or shape, confusing the model.
  • Mixing up labels and data order.
  • Loading all data into memory unintentionally, causing crashes.
  • Not handling file paths or corrupt files gracefully.
Self-check question

Your custom dataset class loads images and labels. You notice training is very slow and sometimes crashes with memory errors. What might be wrong?

Answer: Your dataset might be loading all data into memory at once, using too much RAM. Consider loading data on demand in __getitem__ to reduce memory use and speed up training.

Key Result
For a custom dataset class, the key metric is correct and efficient data loading to ensure smooth model training.