Bird
Raised Fist0
PyTorchml~5 mins

CNN architecture for image classification in PyTorch - Cheat Sheet & Quick Revision

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is the main purpose of a Convolutional Neural Network (CNN) in image classification?
A CNN automatically learns to detect important features like edges, shapes, and textures from images to classify them into categories.
Click to reveal answer
beginner
What does a convolutional layer do in a CNN?
It applies small filters to the input image to create feature maps that highlight important patterns like edges or textures.
Click to reveal answer
beginner
Why do CNNs use pooling layers?
Pooling layers reduce the size of feature maps, making the model faster and helping it focus on the most important features.
Click to reveal answer
beginner
What role does the fully connected layer play in a CNN for image classification?
It takes the extracted features and decides which class the image belongs to by combining all the information.
Click to reveal answer
beginner
How is accuracy calculated during CNN training for image classification?
Accuracy is the percentage of images the CNN correctly classifies out of all images tested.
Click to reveal answer
What is the first layer usually used in a CNN for image classification?
AConvolutional layer
BPooling layer
CFully connected layer
DDropout layer
Which layer reduces the spatial size of the feature maps?
AConvolutional layer
BBatch normalization layer
CFully connected layer
DPooling layer
What does the output layer of a CNN for classification usually use?
ASoftmax activation
BSigmoid activation
CReLU activation
DTanh activation
Which metric tells how many images were correctly classified?
ALoss
BAccuracy
CPrecision
DRecall
What is the main advantage of using convolutional layers over fully connected layers for images?
AThey remove noise from images
BThey increase the image size
CThey reduce the number of parameters by sharing weights
DThey convert images to text
Explain the main components of a CNN architecture used for image classification and their roles.
Think about how the network processes images step-by-step.
You got /5 concepts.
    Describe how accuracy is calculated during CNN training and why it is important.
    Consider what accuracy tells you about the model's performance.
    You got /4 concepts.

      Practice

      (1/5)
      1. What is the main role of convolutional layers in a CNN for image classification?
      easy
      A. To detect features like edges and textures in small parts of the image
      B. To reduce the size of the image by downsampling
      C. To combine all features into a final decision
      D. To randomly change pixel values for data augmentation

      Solution

      1. Step 1: Understand convolutional layers

        Convolutional layers scan small parts of the image to find patterns like edges and textures.
      2. Step 2: Compare with other layers

        Pooling layers reduce image size, and fully connected layers make the final classification decision.
      3. Final Answer:

        To detect features like edges and textures in small parts of the image -> Option A
      4. Quick Check:

        Convolutional layers = feature detection [OK]
      Hint: Convolution layers find patterns, pooling shrinks images [OK]
      Common Mistakes:
      • Confusing pooling with convolution
      • Thinking fully connected layers detect features
      • Believing convolution layers change image size
      2. Which of the following is the correct way to define a 2D convolutional layer in PyTorch with 3 input channels, 16 output channels, and a kernel size of 3?
      easy
      A. nn.Conv2d(16, 3, kernel_size=3)
      B. nn.Conv1d(3, 16, kernel_size=3)
      C. nn.Linear(3, 16, kernel_size=3)
      D. nn.Conv2d(3, 16, kernel_size=3)

      Solution

      1. Step 1: Identify correct layer type and parameters

        For images, use nn.Conv2d with input channels first, then output channels, and kernel size.
      2. Step 2: Check each option

        nn.Conv2d(3, 16, kernel_size=3) uses nn.Conv2d(3, 16, kernel_size=3) which is correct. nn.Conv1d(3, 16, kernel_size=3) uses Conv1d (wrong dimension). nn.Linear(3, 16, kernel_size=3) uses Linear (not convolution). nn.Conv2d(16, 3, kernel_size=3) reverses input/output channels.
      3. Final Answer:

        nn.Conv2d(3, 16, kernel_size=3) -> Option D
      4. Quick Check:

        Conv2d(input_channels, output_channels, kernel_size) = A [OK]
      Hint: Conv2d uses (in_channels, out_channels, kernel_size) order [OK]
      Common Mistakes:
      • Using Conv1d instead of Conv2d for images
      • Swapping input and output channels
      • Using Linear layer for convolution
      3. Given the following PyTorch CNN snippet, what is the output shape after the convolution and pooling layers if the input image size is (3, 32, 32)?
      import torch
      import torch.nn as nn
      
      class SimpleCNN(nn.Module):
          def __init__(self):
              super().__init__()
              self.conv = nn.Conv2d(3, 8, kernel_size=3, padding=1)
              self.pool = nn.MaxPool2d(2, 2)
          def forward(self, x):
              x = self.conv(x)
              x = self.pool(x)
              return x
      
      model = SimpleCNN()
      input_tensor = torch.randn(1, 3, 32, 32)
      output = model(input_tensor)
      print(output.shape)
      medium
      A. torch.Size([1, 8, 30, 30])
      B. torch.Size([1, 8, 16, 16])
      C. torch.Size([1, 3, 16, 16])
      D. torch.Size([1, 8, 32, 32])

      Solution

      1. Step 1: Calculate output size after convolution

        Input size: 32x32, kernel=3, padding=1, stride=1 (default). Output size = (32 - 3 + 2*1)/1 + 1 = 32. Channels change from 3 to 8.
      2. Step 2: Calculate output size after max pooling

        MaxPool2d with kernel=2, stride=2 halves width and height: 32/2 = 16. Channels remain 8.
      3. Final Answer:

        torch.Size([1, 8, 16, 16]) -> Option B
      4. Quick Check:

        Conv keeps size, pooling halves it = B [OK]
      Hint: Conv with padding keeps size; pooling halves it [OK]
      Common Mistakes:
      • Ignoring padding effect on convolution output size
      • Forgetting pooling halves spatial dimensions
      • Mixing up input and output channels
      4. Identify the error in this PyTorch CNN model definition for image classification:
      import torch.nn as nn
      
      class Net(nn.Module):
          def __init__(self):
              super(Net, self).__init__()
              self.conv1 = nn.Conv2d(3, 16, 3)
              self.pool = nn.MaxPool2d(2, 2)
              self.fc1 = nn.Linear(16 * 15 * 15, 10)
      
          def forward(self, x):
              x = self.pool(F.relu(self.conv1(x)))
              x = x.view(-1, 16 * 15 * 15)
              x = self.fc1(x)
              return x
      medium
      A. Pooling layer should come before convolution
      B. The input size to fc1 is incorrect due to convolution output size mismatch
      C. Missing import for torch.nn.functional as F
      D. The number of output classes in fc1 should be 16

      Solution

      1. Step 1: Check imports and usage

        The forward method uses F.relu but torch.nn.functional as F is not imported, causing a NameError.
      2. Step 2: Verify other parts

        Input size to fc1 assumes input image size 32x32 with kernel=3 and no padding, output size after conv and pool is 15x15, so fc1 input size is correct. Pooling after conv is correct. Output classes 10 is reasonable.
      3. Final Answer:

        Missing import for torch.nn.functional as F -> Option C
      4. Quick Check:

        Using F.relu without import = A [OK]
      Hint: Check all used modules are imported [OK]
      Common Mistakes:
      • Forgetting to import torch.nn.functional as F
      • Miscalculating fc1 input size
      • Changing layer order incorrectly
      5. You want to build a CNN in PyTorch to classify 64x64 RGB images into 5 classes. Which architecture below correctly combines convolution, pooling, and fully connected layers to achieve this?
      hard
      A.
      class CNN(nn.Module):
          def __init__(self):
              super().__init__()
              self.conv1 = nn.Conv2d(3, 10, 5)
              self.pool = nn.MaxPool2d(2, 2)
              self.conv2 = nn.Conv2d(10, 20, 5)
              self.fc1 = nn.Linear(20 * 13 * 13, 50)
              self.fc2 = nn.Linear(50, 5)
          def forward(self, x):
              x = self.pool(F.relu(self.conv1(x)))
              x = self.pool(F.relu(self.conv2(x)))
              x = x.view(-1, 20 * 13 * 13)
              x = F.relu(self.fc1(x))
              x = self.fc2(x)
              return x
      B.
      class CNN(nn.Module):
          def __init__(self):
              super().__init__()
              self.conv1 = nn.Conv2d(3, 10, 3)
              self.pool = nn.MaxPool2d(2, 2)
              self.fc1 = nn.Linear(10 * 32 * 32, 5)
          def forward(self, x):
              x = self.pool(F.relu(self.conv1(x)))
              x = x.view(-1, 10 * 32 * 32)
              x = self.fc1(x)
              return x
      C.
      class CNN(nn.Module):
          def __init__(self):
              super().__init__()
              self.conv1 = nn.Conv2d(3, 10, 5)
              self.pool = nn.MaxPool2d(2, 2)
              self.conv2 = nn.Conv2d(10, 20, 5)
              self.fc1 = nn.Linear(20 * 12 * 12, 50)
              self.fc2 = nn.Linear(50, 5)
          def forward(self, x):
              x = self.pool(F.relu(self.conv1(x)))
              x = self.pool(F.relu(self.conv2(x)))
              x = x.view(-1, 20 * 12 * 12)
              x = F.relu(self.fc1(x))
              x = self.fc2(x)
              return x
      D.
      class CNN(nn.Module):
          def __init__(self):
              super().__init__()
              self.conv1 = nn.Conv2d(3, 10, 5)
              self.pool = nn.MaxPool2d(2, 2)
              self.conv2 = nn.Conv2d(10, 20, 5)
              self.fc1 = nn.Linear(20 * 14 * 14, 50)
              self.fc2 = nn.Linear(50, 5)
          def forward(self, x):
              x = self.pool(F.relu(self.conv1(x)))
              x = self.pool(F.relu(self.conv2(x)))
              x = x.view(-1, 20 * 14 * 14)
              x = F.relu(self.fc1(x))
              x = self.fc2(x)
              return x

      Solution

      1. Step 1: Calculate output sizes after conv and pooling layers

        Input: 64x64. Conv1 kernel=5, padding=0: (64-5+1)=60, pool kernel=2 stride=2: 60/2=30. Conv2 kernel=5: (30-5+1)=26, pool: 26/2=13. Final size 20x13x13.
      2. Step 2: Check fc1 input sizes

        class CNN(nn.Module):
            def __init__(self):
                super().__init__()
                self.conv1 = nn.Conv2d(3, 10, 5)
                self.pool = nn.MaxPool2d(2, 2)
                self.conv2 = nn.Conv2d(10, 20, 5)
                self.fc1 = nn.Linear(20 * 13 * 13, 50)
                self.fc2 = nn.Linear(50, 5)
            def forward(self, x):
                x = self.pool(F.relu(self.conv1(x)))
                x = self.pool(F.relu(self.conv2(x)))
                x = x.view(-1, 20 * 13 * 13)
                x = F.relu(self.fc1(x))
                x = self.fc2(x)
                return x
        : 20*13*13 correct.
        class CNN(nn.Module):
            def __init__(self):
                super().__init__()
                self.conv1 = nn.Conv2d(3, 10, 3)
                self.pool = nn.MaxPool2d(2, 2)
                self.fc1 = nn.Linear(10 * 32 * 32, 5)
            def forward(self, x):
                x = self.pool(F.relu(self.conv1(x)))
                x = x.view(-1, 10 * 32 * 32)
                x = self.fc1(x)
                return x
        : single conv kernel=3 gives ~10*31*31 but uses 10*32*32 wrong.
        class CNN(nn.Module):
            def __init__(self):
                super().__init__()
                self.conv1 = nn.Conv2d(3, 10, 5)
                self.pool = nn.MaxPool2d(2, 2)
                self.conv2 = nn.Conv2d(10, 20, 5)
                self.fc1 = nn.Linear(20 * 12 * 12, 50)
                self.fc2 = nn.Linear(50, 5)
            def forward(self, x):
                x = self.pool(F.relu(self.conv1(x)))
                x = self.pool(F.relu(self.conv2(x)))
                x = x.view(-1, 20 * 12 * 12)
                x = F.relu(self.fc1(x))
                x = self.fc2(x)
                return x
        : 20*12*12 too small.
        class CNN(nn.Module):
            def __init__(self):
                super().__init__()
                self.conv1 = nn.Conv2d(3, 10, 5)
                self.pool = nn.MaxPool2d(2, 2)
                self.conv2 = nn.Conv2d(10, 20, 5)
                self.fc1 = nn.Linear(20 * 14 * 14, 50)
                self.fc2 = nn.Linear(50, 5)
            def forward(self, x):
                x = self.pool(F.relu(self.conv1(x)))
                x = self.pool(F.relu(self.conv2(x)))
                x = x.view(-1, 20 * 14 * 14)
                x = F.relu(self.fc1(x))
                x = self.fc2(x)
                return x
        : 20*14*14 too big.
      3. Final Answer:

        nn.Linear(20 * 13 * 13, 50) -> Option A
      4. Quick Check:

        64->60->30->26->13 = 20x13x13 -> A [OK]
      Hint: Calculate conv and pool sizes stepwise to find fc input size [OK]
      Common Mistakes:
      • Ignoring how kernel size reduces image dimensions
      • Assuming pooling does not halve size
      • Mismatching fc layer input size with conv output