Data Flow - 13 Stages

1Input Images

1000 images x 3 channels x 32 height x 32 width→Raw image data loaded from dataset→1000 images x 3 channels x 32 height x 32 width

An image of a cat represented as a 3D array of pixel colors

↓

2Normalization

1000 images x 3 channels x 32 height x 32 width→Scale pixel values from 0-255 to 0-1→1000 images x 3 channels x 32 height x 32 width

Pixel value 128 becomes 0.502

↓

3Convolutional Layer 1

1000 images x 3 channels x 32 height x 32 width→Apply 16 filters of size 3x3 with stride 1 and padding 1→1000 images x 16 channels x 32 height x 32 width

Detect edges and simple shapes in images

↓

4ReLU Activation

1000 images x 16 channels x 32 height x 32 width→Apply ReLU to add non-linearity→1000 images x 16 channels x 32 height x 32 width

Negative values become 0, positive stay same

↓

5Max Pooling

1000 images x 16 channels x 32 height x 32 width→Downsample by taking max over 2x2 regions with stride 2→1000 images x 16 channels x 16 height x 16 width

Reduce image size while keeping important features

↓

6Convolutional Layer 2

1000 images x 16 channels x 16 height x 16 width→Apply 32 filters of size 3x3 with stride 1 and padding 1→1000 images x 32 channels x 16 height x 16 width

Detect more complex patterns

↓

7ReLU Activation

1000 images x 32 channels x 16 height x 16 width→Apply ReLU→1000 images x 32 channels x 16 height x 16 width

Keep positive activations

↓

8Max Pooling

1000 images x 32 channels x 16 height x 16 width→Downsample by 2x2 max pooling→1000 images x 32 channels x 8 height x 8 width

Further reduce spatial size

↓

9Flatten

1000 images x 32 channels x 8 height x 8 width→Convert 3D feature maps to 1D vector→1000 images x 2048 features

32*8*8 = 2048 features per image

↓

10Fully Connected Layer

1000 images x 2048 features→Linear layer to 64 neurons→1000 images x 64 features

Combine features to learn complex relations

↓

11ReLU Activation

1000 images x 64 features→Apply ReLU→1000 images x 64 features

Add non-linearity

↓

12Output Layer

1000 images x 64 features→Linear layer to 10 classes→1000 images x 10 classes

Predict scores for 10 categories

↓

13Softmax

1000 images x 10 classes→Convert scores to probabilities→1000 images x 10 classes

Probabilities sum to 1 for each image

Practice

(1/5)

1. What is the main role of convolutional layers in a CNN for image classification?

easy

A. To detect features like edges and textures in small parts of the image

B. To reduce the size of the image by downsampling

C. To combine all features into a final decision

D. To randomly change pixel values for data augmentation

5. You want to build a CNN in PyTorch to classify 64x64 RGB images into 5 classes. Which architecture below correctly combines convolution, pooling, and fully connected layers to achieve this?

hard

class CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 10, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(10, 20, 5)
        self.fc1 = nn.Linear(20 * 13 * 13, 50)
        self.fc2 = nn.Linear(50, 5)
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 20 * 13 * 13)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

class CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 10, 3)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(10 * 32 * 32, 5)
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = x.view(-1, 10 * 32 * 32)
        x = self.fc1(x)
        return x

class CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 10, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(10, 20, 5)
        self.fc1 = nn.Linear(20 * 12 * 12, 50)
        self.fc2 = nn.Linear(50, 5)
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 20 * 12 * 12)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

class CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 10, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(10, 20, 5)
        self.fc1 = nn.Linear(20 * 14 * 14, 50)
        self.fc2 = nn.Linear(50, 5)
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 20 * 14 * 14)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

Solution

Step 1: Calculate output sizes after conv and pooling layers
Input: 64x64. Conv1 kernel=5, padding=0: (64-5+1)=60, pool kernel=2 stride=2: 60/2=30. Conv2 kernel=5: (30-5+1)=26, pool: 26/2=13. Final size 20x13x13.

Step 2: Check fc1 input sizes

class CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 10, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(10, 20, 5)
        self.fc1 = nn.Linear(20 * 13 * 13, 50)
        self.fc2 = nn.Linear(50, 5)
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 20 * 13 * 13)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

: 20*13*13 correct.

class CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 10, 3)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(10 * 32 * 32, 5)
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = x.view(-1, 10 * 32 * 32)
        x = self.fc1(x)
        return x

: single conv kernel=3 gives ~10*31*31 but uses 10*32*32 wrong.

class CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 10, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(10, 20, 5)
        self.fc1 = nn.Linear(20 * 12 * 12, 50)
        self.fc2 = nn.Linear(50, 5)
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 20 * 12 * 12)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

: 20*12*12 too small.

class CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 10, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(10, 20, 5)
        self.fc1 = nn.Linear(20 * 14 * 14, 50)
        self.fc2 = nn.Linear(50, 5)
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 20 * 14 * 14)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

: 20*14*14 too big.

Final Answer:
nn.Linear(20 * 13 * 13, 50) -> Option A
Quick Check:
64->60->30->26->13 = 20x13x13 -> A [OK]

Hint: Calculate conv and pool sizes stepwise to find fc input size [OK]

Common Mistakes:

Ignoring how kernel size reduces image dimensions
Assuming pooling does not halve size
Mismatching fc layer input size with conv output

Epoch	Loss ↓	Accuracy ↑	Observation
1	1.85	0.35	Model starts learning basic patterns
2	1.25	0.55	Loss decreases, accuracy improves
3	0.95	0.68	Model captures more features
4	0.75	0.75	Good convergence, accuracy rising
5	0.60	0.81	Model stabilizes with better accuracy
6	0.52	0.85	Further improvement, loss lowers
7	0.47	0.87	Training converging well
8	0.43	0.89	High accuracy, low loss
9	0.40	0.90	Model near optimal performance
10	0.38	0.91	Training complete with good results

CNN architecture for image classification in PyTorch - Model Pipeline Trace

Start learning this pattern below

Practice

Solution

Step 1: Understand convolutional layers

Step 2: Compare with other layers

Final Answer:

Quick Check:

Solution

Step 1: Identify correct layer type and parameters

Step 2: Check each option

Final Answer:

Quick Check:

Solution

Step 1: Calculate output size after convolution

Step 2: Calculate output size after max pooling

Final Answer:

Quick Check:

Solution

Step 1: Check imports and usage

Step 2: Verify other parts

Final Answer:

Quick Check:

Solution

Step 1: Calculate output sizes after conv and pooling layers

Step 2: Check fc1 input sizes

Final Answer:

Quick Check: