Which reason best explains why Convolutional Neural Networks (CNNs) usually outperform fully connected networks on image classification tasks?
Think about how images have nearby pixels related to each other.
CNNs use filters that slide over small regions of the image, sharing weights to detect features like edges and shapes. This local connectivity and weight sharing reduce parameters and capture spatial patterns better than fully connected layers.
Given a grayscale image of size 28x28 and a convolution layer with 6 filters of size 5x5, stride 1, and no padding, what is the output shape?
import torch import torch.nn as nn input_tensor = torch.randn(1, 1, 28, 28) # batch=1, channels=1, height=28, width=28 conv = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5, stride=1, padding=0) output = conv(input_tensor) print(output.shape)
Output size = (Input size - Kernel size) / Stride + 1
Output height and width = (28 - 5)/1 + 1 = 24. Number of output channels = 6 filters. Batch size stays 1.
You want to classify high-resolution images with many classes. Which CNN architecture choice is best to improve accuracy while managing training time?
Think about how very deep networks can be trained effectively.
Residual connections help training very deep CNNs by allowing gradients to flow better, improving accuracy without excessive training problems. Very deep networks without residuals often suffer from vanishing gradients.
How does increasing the convolution kernel size from 3x3 to 7x7 generally affect a CNN's ability to detect features in images?
Consider the trade-off between detail and context in feature detection.
Larger kernels cover more area, capturing broader features but increasing parameters and computation, which can lead to overfitting if not managed well.
During CNN training on image classification, you observe training accuracy steadily increasing but validation accuracy plateaus and then decreases. What does this indicate?
Think about what it means when training improves but validation worsens.
When training accuracy improves but validation accuracy drops, the model memorizes training data patterns but fails to generalize to new data, a sign of overfitting.