Given an input image of size 64x64 with 3 color channels, a convolutional layer uses 16 filters of size 3x3, stride 1, and padding 1. What will be the output shape of this convolutional layer?
Remember that padding of 1 keeps the spatial dimensions the same when stride is 1.
Padding of 1 adds one pixel border around the input, so the output spatial size remains 64x64. The number of filters determines the depth, which is 16.
What is the output shape after applying a 2x2 max pooling layer with stride 2 on an input tensor of shape (32, 32, 10)?
Max pooling reduces spatial dimensions by the stride when filter size equals stride.
With 2x2 filter and stride 2, the height and width are halved. Depth remains the same.
You want to design a CNN layer to detect edges in images. Which kernel size is most appropriate for this task?
Edge detection usually requires small kernels to capture local gradients.
3x3 kernels are commonly used for edge detection because they capture local changes effectively without too much computation.
A CNN model training on image classification shows training accuracy of 98% but validation accuracy of 75%. What is the most likely explanation?
High training accuracy but low validation accuracy usually means the model memorizes training data.
Overfitting happens when the model learns training data too well but fails to generalize to new data.
Consider this PyTorch CNN layer definition:
nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, stride=2, padding=2)
What is the output size for an input image of size 64x64? Is there an error in the output size calculation?
Use the formula: ((Input - Kernel + 2*Padding) / Stride) + 1
Output size = floor((64 - 3 + 2*2) / 2) + 1 = floor((64 - 3 + 4)/2) + 1 = floor(65/2) + 1 = 32 + 1 = 33. Padding of 2 is large for kernel 3 and stride 2, causing output to be larger than half input size.