Bird
Raised Fist0
PyTorchml~8 mins

nn.Conv2d layers in PyTorch - Model Metrics & Evaluation

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Metrics & Evaluation - nn.Conv2d layers
Which metric matters for nn.Conv2d layers and WHY

When using nn.Conv2d layers in neural networks, the main goal is to learn useful features from images. The key metrics to check are training loss and validation accuracy. Loss tells us how well the model fits the training data, while accuracy shows how well it predicts new images. We also watch overfitting signs by comparing training and validation metrics. For tasks like image classification, accuracy is important. For image generation or segmentation, other metrics like Intersection over Union (IoU) or Mean Squared Error (MSE) matter. But overall, loss and accuracy guide us to know if the convolution layers are helping the model learn useful patterns.

Confusion matrix example for image classification with nn.Conv2d
      | Predicted Cat | Predicted Dog |
      |---------------|---------------|
      | True Cat: 50  | False Dog: 5  |
      | False Cat: 3  | True Dog: 42  |

      Total samples = 50 + 5 + 3 + 42 = 100

      Precision (Cat) = TP / (TP + FP) = 50 / (50 + 5) = 0.909
      Recall (Cat) = TP / (TP + FN) = 50 / (50 + 3) = 0.943

      Precision (Dog) = 42 / (42 + 3) = 0.933
      Recall (Dog) = 42 / (42 + 5) = 0.893
    

This confusion matrix shows how well the convolutional model classifies cats and dogs. Precision and recall help us understand errors in each class.

Precision vs Recall tradeoff with nn.Conv2d models

Imagine a model using nn.Conv2d layers to detect defects in product images.

  • High precision means most detected defects are real defects. This avoids wasting time fixing false alarms.
  • High recall means the model finds most real defects, even if some false alarms happen.

If the factory wants to avoid missing any defect, recall is more important. But if fixing false alarms is costly, precision matters more. The convolutional layers help extract features, but tuning the model affects this tradeoff.

Good vs Bad metric values for nn.Conv2d layers in image classification

Good metrics:

  • Training loss steadily decreases and validation loss decreases or stabilizes.
  • Validation accuracy above 80% for simple datasets like CIFAR-10.
  • Balanced precision and recall above 0.8 for each class.

Bad metrics:

  • Training loss decreases but validation loss increases (overfitting).
  • Validation accuracy stuck near random guess (e.g., 10% for 10 classes).
  • Very low recall or precision for important classes.
Common pitfalls when evaluating nn.Conv2d models
  • Accuracy paradox: High accuracy can be misleading if classes are imbalanced. For example, 90% accuracy if 90% of images are one class.
  • Data leakage: If test images leak into training, metrics look unrealistically good.
  • Overfitting: Model memorizes training images but fails on new images, seen by gap between training and validation metrics.
  • Ignoring class imbalance: Not using precision and recall can hide poor performance on rare classes.
Self-check question

Your convolutional model has 98% training accuracy but only 12% recall on the defect class in validation. Is it good for production? Why or why not?

Answer: No, it is not good. The model is very good at training data but misses most defects in new images (low recall). This means it fails to find important defects, which is risky. The model likely overfits and needs better training or data.

Key Result
For nn.Conv2d layers, monitoring training loss and validation accuracy with precision and recall per class ensures the model learns useful image features without overfitting.

Practice

(1/5)
1. What does the nn.Conv2d layer in PyTorch primarily do?
easy
A. It increases the image size by adding pixels.
B. It slides filters over images to find patterns.
C. It converts images to grayscale.
D. It sorts images by color intensity.

Solution

  1. Step 1: Understand the role of convolution layers

    Convolution layers slide small filters over input images to detect features like edges or textures.
  2. Step 2: Match the function to the options

    Only It slides filters over images to find patterns. correctly describes this sliding filter action, while others describe unrelated image operations.
  3. Final Answer:

    It slides filters over images to find patterns. -> Option B
  4. Quick Check:

    Convolution = sliding filters [OK]
Hint: Conv2d = sliding filters over images to find features [OK]
Common Mistakes:
  • Thinking Conv2d changes image size by adding pixels
  • Confusing Conv2d with image color adjustments
  • Assuming Conv2d sorts or rearranges pixels
2. Which of the following is the correct way to create a Conv2d layer with 3 input channels, 16 output channels, and a 3x3 kernel in PyTorch?
easy
A. nn.Conv2d(3, 16, kernel_size=3)
B. nn.Conv2d(16, 3, kernel_size=3)
C. nn.Conv2d(3, 16, kernel=3)
D. nn.Conv2d(input=3, output=16, size=3)

Solution

  1. Step 1: Recall Conv2d constructor parameters

    The correct order is nn.Conv2d(in_channels, out_channels, kernel_size).
  2. Step 2: Check each option

    nn.Conv2d(3, 16, kernel_size=3) matches the correct parameter order and uses the correct keyword for kernel size. The other options have wrong parameter order or incorrect keywords.
  3. Final Answer:

    nn.Conv2d(3, 16, kernel_size=3) -> Option A
  4. Quick Check:

    Conv2d(in, out, kernel_size) = A [OK]
Hint: Remember Conv2d(in_channels, out_channels, kernel_size) [OK]
Common Mistakes:
  • Swapping input and output channels
  • Using wrong parameter names like 'kernel' instead of 'kernel_size'
  • Passing parameters as keywords not supported by Conv2d
3. What will be the output shape of the following PyTorch Conv2d layer when applied to an input tensor of shape (1, 3, 32, 32)?
conv = nn.Conv2d(3, 6, kernel_size=5)
output = conv(torch.randn(1, 3, 32, 32))
print(output.shape)
medium
A. torch.Size([1, 3, 28, 28])
B. torch.Size([1, 6, 32, 32])
C. torch.Size([6, 3, 28, 28])
D. torch.Size([1, 6, 28, 28])

Solution

  1. Step 1: Calculate output spatial size

    Output size = (Input size - Kernel size + 1) = (32 - 5 + 1) = 28 for both height and width.
  2. Step 2: Determine output channels and batch size

    Output channels = 6, batch size = 1, so output shape is (1, 6, 28, 28).
  3. Final Answer:

    torch.Size([1, 6, 28, 28]) -> Option D
  4. Quick Check:

    Output shape = (batch, out_channels, 28, 28) [OK]
Hint: Output size = input - kernel + 1 if stride=1, padding=0 [OK]
Common Mistakes:
  • Assuming output size equals input size without padding
  • Mixing up input and output channels in shape
  • Forgetting batch size dimension
4. Identify the error in this Conv2d layer definition:
conv = nn.Conv2d(3, 16, kernel_size=3, stride=2, padding=3)
output = conv(torch.randn(1, 3, 28, 28))
print(output.shape)
medium
A. Stride cannot be 2 in Conv2d.
B. Input tensor shape is incorrect for 3 input channels.
C. Padding is too large causing output size to increase unexpectedly.
D. Kernel size must be an odd number.

Solution

  1. Step 1: Calculate output size with given parameters

    Output size formula: floor((Input + 2*padding - kernel_size)/stride) + 1 = floor((28 + 6 - 3)/2) + 1 = floor(31/2) + 1 = 15 + 1 = 16.
  2. Step 2: Understand padding effect

    Padding=3 is large for kernel=3, causing output spatial size to increase unexpectedly, which is unusual and may cause unexpected behavior.
  3. Final Answer:

    Padding is too large causing output size to increase unexpectedly. -> Option C
  4. Quick Check:

    Large padding inflates output size [OK]
Hint: Check padding size relative to kernel size for output shape [OK]
Common Mistakes:
  • Thinking stride=2 is invalid
  • Assuming input shape is wrong for 3 channels
  • Believing kernel size must be odd always
5. You want to design a Conv2d layer that keeps the input image size (28x28) unchanged after convolution with a 5x5 kernel and stride 1. Which padding value should you use?
hard
A. Padding = 2
B. Padding = 1
C. Padding = 0
D. Padding = 3

Solution

  1. Step 1: Use output size formula for Conv2d

    Output size = floor((Input + 2*padding - kernel_size)/stride) + 1. We want output = input = 28, stride=1, kernel=5.
  2. Step 2: Solve for padding

    28 = (28 + 2*padding - 5) + 1 -> 28 = 24 + 2*padding -> 2*padding = 4 -> padding = 2.
  3. Final Answer:

    Padding = 2 -> Option A
  4. Quick Check:

    Padding 2 keeps size with 5x5 kernel [OK]
Hint: Padding = (kernel_size - 1) / 2 for same size [OK]
Common Mistakes:
  • Using zero padding and expecting same size
  • Choosing padding less than 2 for 5x5 kernel
  • Confusing stride effect with padding