A CNN helps a computer learn to recognize pictures by looking at small parts step-by-step.
CNN architecture for image classification in PyTorch
Start learning this pattern below
Jump into concepts and practice - no test required
or
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Introduction
Syntax
PyTorch
import torch import torch.nn as nn class SimpleCNN(nn.Module): def __init__(self): super(SimpleCNN, self).__init__() self.conv1 = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, padding=1) self.relu = nn.ReLU() self.pool = nn.MaxPool2d(kernel_size=2, stride=2) self.conv2 = nn.Conv2d(16, 32, 3, padding=1) self.fc1 = nn.Linear(32 * 8 * 8, 10) # assuming input images are 32x32 def forward(self, x): x = self.pool(self.relu(self.conv1(x))) x = self.pool(self.relu(self.conv2(x))) x = x.view(x.size(0), -1) # flatten x = self.fc1(x) return x
Input images are expected to have 3 color channels (RGB) and size 32x32 pixels.
Output layer size (10) matches the number of classes to predict.
Examples
PyTorch
self.conv1 = nn.Conv2d(3, 16, 3, padding=1) self.pool = nn.MaxPool2d(2, 2)
PyTorch
x = x.view(x.size(0), -1)
PyTorch
self.fc1 = nn.Linear(32 * 8 * 8, 10)
Sample Model
This code trains the CNN on one batch of CIFAR10 images and prints the loss and predicted classes for the first 5 images.
PyTorch
import torch import torch.nn as nn import torch.optim as optim from torch.utils.data import DataLoader from torchvision import datasets, transforms # Define CNN model class SimpleCNN(nn.Module): def __init__(self): super(SimpleCNN, self).__init__() self.conv1 = nn.Conv2d(3, 16, 3, padding=1) self.relu = nn.ReLU() self.pool = nn.MaxPool2d(2, 2) self.conv2 = nn.Conv2d(16, 32, 3, padding=1) self.fc1 = nn.Linear(32 * 8 * 8, 10) def forward(self, x): x = self.pool(self.relu(self.conv1(x))) x = self.pool(self.relu(self.conv2(x))) x = x.view(x.size(0), -1) x = self.fc1(x) return x # Prepare data (CIFAR10, small image dataset) transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) ]) trainset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform) trainloader = DataLoader(trainset, batch_size=64, shuffle=True) # Initialize model, loss, optimizer model = SimpleCNN() criterion = nn.CrossEntropyLoss() optimizer = optim.Adam(model.parameters(), lr=0.001) # Train for 1 epoch model.train() for images, labels in trainloader: optimizer.zero_grad() outputs = model(images) loss = criterion(outputs, labels) loss.backward() optimizer.step() break # train on only 1 batch for demo # Print loss and prediction example print(f"Loss after 1 batch: {loss.item():.4f}") _, predicted = torch.max(outputs, 1) print(f"Predicted classes for first 5 images: {predicted[:5].tolist()}")
Important Notes
Use small batch sizes when starting to keep training fast and simple.
ReLU helps the model learn by adding non-linearity.
Pooling reduces image size and helps the model focus on important features.
Summary
CNNs look at images in small parts to learn patterns.
Convolution layers find features, pooling layers shrink images, and fully connected layers decide the class.
Training adjusts the CNN to recognize images correctly.
Practice
1. What is the main role of convolutional layers in a CNN for image classification?
easy
Solution
Step 1: Understand convolutional layers
Convolutional layers scan small parts of the image to find patterns like edges and textures.Step 2: Compare with other layers
Pooling layers reduce image size, and fully connected layers make the final classification decision.Final Answer:
To detect features like edges and textures in small parts of the image -> Option AQuick Check:
Convolutional layers = feature detection [OK]
Hint: Convolution layers find patterns, pooling shrinks images [OK]
Common Mistakes:
- Confusing pooling with convolution
- Thinking fully connected layers detect features
- Believing convolution layers change image size
2. Which of the following is the correct way to define a 2D convolutional layer in PyTorch with 3 input channels, 16 output channels, and a kernel size of 3?
easy
Solution
Step 1: Identify correct layer type and parameters
For images, use nn.Conv2d with input channels first, then output channels, and kernel size.Step 2: Check each option
nn.Conv2d(3, 16, kernel_size=3)uses nn.Conv2d(3, 16, kernel_size=3) which is correct.nn.Conv1d(3, 16, kernel_size=3)uses Conv1d (wrong dimension).nn.Linear(3, 16, kernel_size=3)uses Linear (not convolution).nn.Conv2d(16, 3, kernel_size=3)reverses input/output channels.Final Answer:
nn.Conv2d(3, 16, kernel_size=3) -> Option DQuick Check:
Conv2d(input_channels, output_channels, kernel_size) = A [OK]
Hint: Conv2d uses (in_channels, out_channels, kernel_size) order [OK]
Common Mistakes:
- Using Conv1d instead of Conv2d for images
- Swapping input and output channels
- Using Linear layer for convolution
3. Given the following PyTorch CNN snippet, what is the output shape after the convolution and pooling layers if the input image size is (3, 32, 32)?
import torch
import torch.nn as nn
class SimpleCNN(nn.Module):
def __init__(self):
super().__init__()
self.conv = nn.Conv2d(3, 8, kernel_size=3, padding=1)
self.pool = nn.MaxPool2d(2, 2)
def forward(self, x):
x = self.conv(x)
x = self.pool(x)
return x
model = SimpleCNN()
input_tensor = torch.randn(1, 3, 32, 32)
output = model(input_tensor)
print(output.shape)medium
Solution
Step 1: Calculate output size after convolution
Input size: 32x32, kernel=3, padding=1, stride=1 (default). Output size = (32 - 3 + 2*1)/1 + 1 = 32. Channels change from 3 to 8.Step 2: Calculate output size after max pooling
MaxPool2d with kernel=2, stride=2 halves width and height: 32/2 = 16. Channels remain 8.Final Answer:
torch.Size([1, 8, 16, 16]) -> Option BQuick Check:
Conv keeps size, pooling halves it = B [OK]
Hint: Conv with padding keeps size; pooling halves it [OK]
Common Mistakes:
- Ignoring padding effect on convolution output size
- Forgetting pooling halves spatial dimensions
- Mixing up input and output channels
4. Identify the error in this PyTorch CNN model definition for image classification:
import torch.nn as nn
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 16, 3)
self.pool = nn.MaxPool2d(2, 2)
self.fc1 = nn.Linear(16 * 15 * 15, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = x.view(-1, 16 * 15 * 15)
x = self.fc1(x)
return xmedium
Solution
Step 1: Check imports and usage
The forward method uses F.relu but torch.nn.functional as F is not imported, causing a NameError.Step 2: Verify other parts
Input size to fc1 assumes input image size 32x32 with kernel=3 and no padding, output size after conv and pool is 15x15, so fc1 input size is correct. Pooling after conv is correct. Output classes 10 is reasonable.Final Answer:
Missing import for torch.nn.functional as F -> Option CQuick Check:
Using F.relu without import = A [OK]
Hint: Check all used modules are imported [OK]
Common Mistakes:
- Forgetting to import torch.nn.functional as F
- Miscalculating fc1 input size
- Changing layer order incorrectly
5. You want to build a CNN in PyTorch to classify 64x64 RGB images into 5 classes. Which architecture below correctly combines convolution, pooling, and fully connected layers to achieve this?
hard
Solution
Step 1: Calculate output sizes after conv and pooling layers
Input: 64x64. Conv1 kernel=5, padding=0: (64-5+1)=60, pool kernel=2 stride=2: 60/2=30. Conv2 kernel=5: (30-5+1)=26, pool: 26/2=13. Final size 20x13x13.Step 2: Check fc1 input sizes
class CNN(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(3, 10, 5) self.pool = nn.MaxPool2d(2, 2) self.conv2 = nn.Conv2d(10, 20, 5) self.fc1 = nn.Linear(20 * 13 * 13, 50) self.fc2 = nn.Linear(50, 5) def forward(self, x): x = self.pool(F.relu(self.conv1(x))) x = self.pool(F.relu(self.conv2(x))) x = x.view(-1, 20 * 13 * 13) x = F.relu(self.fc1(x)) x = self.fc2(x) return x: 20*13*13 correct.class CNN(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(3, 10, 3) self.pool = nn.MaxPool2d(2, 2) self.fc1 = nn.Linear(10 * 32 * 32, 5) def forward(self, x): x = self.pool(F.relu(self.conv1(x))) x = x.view(-1, 10 * 32 * 32) x = self.fc1(x) return x: single conv kernel=3 gives ~10*31*31 but uses 10*32*32 wrong.class CNN(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(3, 10, 5) self.pool = nn.MaxPool2d(2, 2) self.conv2 = nn.Conv2d(10, 20, 5) self.fc1 = nn.Linear(20 * 12 * 12, 50) self.fc2 = nn.Linear(50, 5) def forward(self, x): x = self.pool(F.relu(self.conv1(x))) x = self.pool(F.relu(self.conv2(x))) x = x.view(-1, 20 * 12 * 12) x = F.relu(self.fc1(x)) x = self.fc2(x) return x: 20*12*12 too small.class CNN(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(3, 10, 5) self.pool = nn.MaxPool2d(2, 2) self.conv2 = nn.Conv2d(10, 20, 5) self.fc1 = nn.Linear(20 * 14 * 14, 50) self.fc2 = nn.Linear(50, 5) def forward(self, x): x = self.pool(F.relu(self.conv1(x))) x = self.pool(F.relu(self.conv2(x))) x = x.view(-1, 20 * 14 * 14) x = F.relu(self.fc1(x)) x = self.fc2(x) return x: 20*14*14 too big.Final Answer:
nn.Linear(20 * 13 * 13, 50) -> Option AQuick Check:
64->60->30->26->13 = 20x13x13 -> A [OK]
Hint: Calculate conv and pool sizes stepwise to find fc input size [OK]
Common Mistakes:
- Ignoring how kernel size reduces image dimensions
- Assuming pooling does not halve size
- Mismatching fc layer input size with conv output
