Why do convolutional neural networks (CNNs) use filters (also called kernels) when processing images?
Think about how looking at small parts of a picture helps you recognize shapes.
CNN filters slide over small parts of an image to find simple patterns like edges or textures. These patterns help the network understand the image step by step.
Given a grayscale image tensor of shape (1, 1, 28, 28) and a convolution layer with 6 filters of size 3x3, stride 1, and padding 0, what is the output shape after applying the convolution?
import torch import torch.nn as nn image = torch.randn(1, 1, 28, 28) # batch=1, channels=1, height=28, width=28 conv = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=3, stride=1, padding=0) output = conv(image) print(output.shape)
Output size = (Input size - Kernel size + 2 * Padding) / Stride + 1
With input 28x28, kernel 3x3, stride 1, padding 0, output height and width = 28 - 3 + 0 + 1 = 26. Number of output channels = 6 filters.
You want to build a model to recognize handwritten digits from images. Which model type is best suited to detect spatial patterns like edges and curves?
Think about which model type is designed to understand images by looking at small parts.
CNNs are designed to detect spatial patterns by applying filters over image regions, making them ideal for image tasks like digit recognition.
How does increasing the kernel size in a CNN layer affect the spatial patterns the model can detect?
Think about how a bigger window sees more of the image at once.
Larger kernels look at bigger areas of the image, detecting larger patterns but also reducing the output feature map size more due to less sliding positions.
During CNN training on image data, you observe the training loss steadily decreases but the validation accuracy stops improving and fluctuates. What does this indicate?
Think about what it means when training improves but validation does not.
When training loss decreases but validation accuracy fluctuates or stops improving, the model is likely memorizing training data patterns and failing to generalize, which is overfitting.