In convolutional neural networks, what is the most common effect of increasing the number of layers (depth) on model performance?
Think about what happens when a model becomes very deep and how training might be affected.
Increasing depth allows the model to learn more complex features, but very deep networks can suffer from vanishing gradients and overfitting, which can hurt performance.
Given the following PyTorch model snippet, what is the output shape after the last layer?
import torch import torch.nn as nn class SimpleCNN(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(3, 16, kernel_size=3, padding=1) self.pool = nn.MaxPool2d(2, 2) self.conv2 = nn.Conv2d(16, 32, kernel_size=3, padding=1) def forward(self, x): x = self.pool(torch.relu(self.conv1(x))) x = self.pool(torch.relu(self.conv2(x))) return x model = SimpleCNN() input_tensor = torch.randn(1, 3, 64, 64) output = model(input_tensor) output.shape
Calculate the size after each convolution and pooling step.
Input 64x64 → conv1 keeps size 64x64 → pool halves to 32x32 → conv2 keeps size 32x32 → pool halves to 16x16. Channels after conv2 are 32.
How does increasing the convolution kernel size from 3x3 to 7x7 typically affect a CNN's ability to extract features?
Think about how kernel size relates to the area of the image the filter sees.
Larger kernels see more of the image at once, capturing broader features but increasing model size and overfitting risk.
A CNN model with 10 layers achieves 85% validation accuracy. Increasing to 50 layers drops validation accuracy to 70%. What is the most likely reason?
Consider training difficulties with very deep networks.
Very deep networks can have vanishing gradients making training hard and can overfit if not regularized, causing validation accuracy to drop.
Given a deep CNN training with exploding gradients, which architectural choice is the most likely cause?
Think about what helps stabilize training in deep networks.
ReLU without batch normalization or residual connections can cause exploding gradients in deep networks due to unstable activations.