Fully Convolutional Networks (FCNs) are popular for image segmentation tasks. What is the key advantage of using an FCN compared to traditional CNNs with fully connected layers?
Think about how FCNs handle input size and output size compared to traditional CNNs.
FCNs replace fully connected layers with convolutional layers, allowing them to accept images of any size and output segmentation maps of corresponding spatial dimensions. This flexibility is a key advantage for dense prediction tasks like segmentation.
Consider an input tensor of shape (batch_size=1, channels=3, height=64, width=64) passed through a convolutional layer in an FCN with 16 filters, kernel size 3, stride 1, and padding 1. What will be the output shape?
import torch import torch.nn as nn input_tensor = torch.randn(1, 3, 64, 64) conv_layer = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, stride=1, padding=1) output = conv_layer(input_tensor) output.shape
Recall how padding affects output size in convolution layers.
With kernel size 3, stride 1, and padding 1, the output height and width remain the same as input (64x64). The number of output channels equals the number of filters (16).
You want to build an FCN for semantic segmentation on a dataset with 10 classes. Which architecture choice is best to produce pixel-wise class predictions?
Think about how to get class scores for each pixel in the input image.
FCNs use convolutional layers to keep spatial information. A 1x1 convolution with 10 filters outputs class scores for each pixel, enabling pixel-wise classification for segmentation.
In an FCN, which upsampling method is most likely to produce smoother and more accurate segmentation maps?
Consider which method interpolates pixel values smoothly.
Bilinear interpolation uses weighted averages of nearby pixels, producing smoother upsampled images than nearest neighbor, which simply copies pixels.
You trained an FCN for semantic segmentation. Which metric best measures how well the predicted segmentation matches the ground truth at the pixel level?
Think about a metric that balances false positives and false negatives per class.
IoU measures overlap between predicted and true segmentation masks per class, balancing false positives and false negatives, making it ideal for segmentation evaluation.