PyTorchml~8 mins

Image dataset from folders in PyTorch - Model Metrics & Evaluation

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Metrics & Evaluation - Image dataset from folders

Which metric matters for this concept and WHY

When using image datasets loaded from folders, the key metrics depend on the task. For classification, accuracy is often used to see how many images are correctly labeled. However, if classes are imbalanced, precision and recall become important to understand how well the model finds each class. These metrics help check if the dataset loading and labeling from folders is correct and if the model learns well.

Confusion matrix or equivalent visualization (ASCII)

          Predicted
          Cat  Dog
Actual Cat  45   5
       Dog   3  47

TP (Cat) = 45, FP (Cat) = 3, FN (Cat) = 5, TN (Cat) = 47
Total samples = 100

This matrix shows how many images were correctly or wrongly classified after loading from folders.

Precision vs Recall tradeoff with concrete examples

Precision tells us how many images predicted as a class are actually that class. Recall tells us how many images of a class were found by the model.

Example: If you have a folder with many dog images but few cat images, high recall for cats means the model finds most cat images. High precision means when the model says "cat," it is usually correct.

Depending on the use, you might want to optimize for one. For example, in a pet app, you want high precision to avoid wrong labels. In wildlife monitoring, high recall is important to not miss rare animals.

What "good" vs "bad" metric values look like for this use case

Good: Accuracy above 90%, precision and recall above 85% for all classes means the dataset from folders is well loaded and the model learns well.

Bad: Accuracy below 70%, or very low precision/recall for some classes means problems like mislabeled folders, unbalanced data, or model issues.

Metrics pitfalls

Accuracy paradox: High accuracy can be misleading if one class dominates the dataset loaded from folders.
Data leakage: If images from the same folder appear in both training and testing sets, metrics will be too optimistic.
Overfitting indicators: Very high training accuracy but low test accuracy means the model memorizes folder images but does not generalize.

Self-check question

Your model trained on images loaded from folders has 98% accuracy but only 12% recall on a rare class. Is it good for production? Why or why not?

Answer: No, it is not good. The model misses most images of the rare class (low recall), which means it fails to detect important cases even though overall accuracy is high.

Key Result

Accuracy shows overall correctness, but precision and recall reveal class-wise performance and dataset quality.