PyTorchml~5 mins

CNN architecture for image classification in PyTorch

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Introduction

A CNN helps a computer learn to recognize pictures by looking at small parts step-by-step.

When you want a computer to tell if a photo has a cat or a dog.

When sorting pictures into groups like cars, trees, or people.

When you want to find objects in photos, like faces or signs.

When you want to improve photo search by recognizing what's inside.

When building apps that need to understand images, like photo filters.

Syntax

PyTorch

import torch
import torch.nn as nn

class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, padding=1)
        self.relu = nn.ReLU()
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
        self.fc1 = nn.Linear(32 * 8 * 8, 10)  # assuming input images are 32x32

    def forward(self, x):
        x = self.pool(self.relu(self.conv1(x)))
        x = self.pool(self.relu(self.conv2(x)))
        x = x.view(x.size(0), -1)  # flatten
        x = self.fc1(x)
        return x

Input images are expected to have 3 color channels (RGB) and size 32x32 pixels.

Output layer size (10) matches the number of classes to predict.

Examples

First convolution layer with 16 filters and 3x3 size, followed by 2x2 max pooling to reduce image size.

PyTorch

self.conv1 = nn.Conv2d(3, 16, 3, padding=1)
self.pool = nn.MaxPool2d(2, 2)

Flatten the 3D feature maps into 1D vector before feeding into the fully connected layer.

PyTorch

x = x.view(x.size(0), -1)

Fully connected layer that outputs 10 class scores from the flattened features.

PyTorch

self.fc1 = nn.Linear(32 * 8 * 8, 10)

Sample Model

This code trains the CNN on one batch of CIFAR10 images and prints the loss and predicted classes for the first 5 images.

PyTorch

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

# Define CNN model
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, 3, padding=1)
        self.relu = nn.ReLU()
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
        self.fc1 = nn.Linear(32 * 8 * 8, 10)

    def forward(self, x):
        x = self.pool(self.relu(self.conv1(x)))
        x = self.pool(self.relu(self.conv2(x)))
        x = x.view(x.size(0), -1)
        x = self.fc1(x)
        return x

# Prepare data (CIFAR10, small image dataset)
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
trainset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = DataLoader(trainset, batch_size=64, shuffle=True)

# Initialize model, loss, optimizer
model = SimpleCNN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Train for 1 epoch
model.train()
for images, labels in trainloader:
    optimizer.zero_grad()
    outputs = model(images)
    loss = criterion(outputs, labels)
    loss.backward()
    optimizer.step()
    break  # train on only 1 batch for demo

# Print loss and prediction example
print(f"Loss after 1 batch: {loss.item():.4f}")
_, predicted = torch.max(outputs, 1)
print(f"Predicted classes for first 5 images: {predicted[:5].tolist()}")

OutputSuccess

Important Notes

Use small batch sizes when starting to keep training fast and simple.

ReLU helps the model learn by adding non-linearity.

Pooling reduces image size and helps the model focus on important features.

Summary

CNNs look at images in small parts to learn patterns.

Convolution layers find features, pooling layers shrink images, and fully connected layers decide the class.

Training adjusts the CNN to recognize images correctly.

Practice

(1/5)

1. What is the main role of convolutional layers in a CNN for image classification?

easy

A. To detect features like edges and textures in small parts of the image

B. To reduce the size of the image by downsampling

C. To combine all features into a final decision

D. To randomly change pixel values for data augmentation

5. You want to build a CNN in PyTorch to classify 64x64 RGB images into 5 classes. Which architecture below correctly combines convolution, pooling, and fully connected layers to achieve this?

hard

class CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 10, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(10, 20, 5)
        self.fc1 = nn.Linear(20 * 13 * 13, 50)
        self.fc2 = nn.Linear(50, 5)
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 20 * 13 * 13)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

class CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 10, 3)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(10 * 32 * 32, 5)
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = x.view(-1, 10 * 32 * 32)
        x = self.fc1(x)
        return x

class CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 10, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(10, 20, 5)
        self.fc1 = nn.Linear(20 * 12 * 12, 50)
        self.fc2 = nn.Linear(50, 5)
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 20 * 12 * 12)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

class CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 10, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(10, 20, 5)
        self.fc1 = nn.Linear(20 * 14 * 14, 50)
        self.fc2 = nn.Linear(50, 5)
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 20 * 14 * 14)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

Solution

Step 1: Calculate output sizes after conv and pooling layers
Input: 64x64. Conv1 kernel=5, padding=0: (64-5+1)=60, pool kernel=2 stride=2: 60/2=30. Conv2 kernel=5: (30-5+1)=26, pool: 26/2=13. Final size 20x13x13.

Step 2: Check fc1 input sizes

class CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 10, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(10, 20, 5)
        self.fc1 = nn.Linear(20 * 13 * 13, 50)
        self.fc2 = nn.Linear(50, 5)
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 20 * 13 * 13)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

: 20*13*13 correct.

class CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 10, 3)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(10 * 32 * 32, 5)
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = x.view(-1, 10 * 32 * 32)
        x = self.fc1(x)
        return x

: single conv kernel=3 gives ~10*31*31 but uses 10*32*32 wrong.

class CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 10, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(10, 20, 5)
        self.fc1 = nn.Linear(20 * 12 * 12, 50)
        self.fc2 = nn.Linear(50, 5)
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 20 * 12 * 12)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

: 20*12*12 too small.

class CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 10, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(10, 20, 5)
        self.fc1 = nn.Linear(20 * 14 * 14, 50)
        self.fc2 = nn.Linear(50, 5)
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 20 * 14 * 14)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

: 20*14*14 too big.

Final Answer:
nn.Linear(20 * 13 * 13, 50) -> Option A
Quick Check:
64->60->30->26->13 = 20x13x13 -> A [OK]

Hint: Calculate conv and pool sizes stepwise to find fc input size [OK]

Common Mistakes:

Ignoring how kernel size reduces image dimensions
Assuming pooling does not halve size
Mismatching fc layer input size with conv output

CNN architecture for image classification in PyTorch

Start learning this pattern below

Practice

Solution

Step 1: Understand convolutional layers

Step 2: Compare with other layers

Final Answer:

Quick Check:

Solution

Step 1: Identify correct layer type and parameters

Step 2: Check each option

Final Answer:

Quick Check:

Solution

Step 1: Calculate output size after convolution

Step 2: Calculate output size after max pooling

Final Answer:

Quick Check:

Solution

Step 1: Check imports and usage

Step 2: Verify other parts

Final Answer:

Quick Check:

Solution

Step 1: Calculate output sizes after conv and pooling layers

Step 2: Check fc1 input sizes

Final Answer:

Quick Check: