An FCN helps computers understand images by labeling each pixel. It turns images into meaningful maps, like coloring each part of a photo.
0
0
FCN (Fully Convolutional Network) in Computer Vision
Introduction
When you want to find objects in a photo and know exactly where they are.
When you need to separate different parts of an image, like roads and buildings on a map.
When you want to help a robot see and understand its surroundings.
When you want to analyze medical images to find areas of interest.
When you want to create special effects by changing parts of an image.
Syntax
Computer Vision
class FCN(nn.Module): def __init__(self, num_classes): super(FCN, self).__init__() self.conv1 = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, padding=1) self.relu = nn.ReLU() self.conv2 = nn.Conv2d(64, num_classes, kernel_size=1) def forward(self, x): x = self.relu(self.conv1(x)) x = self.conv2(x) return x
FCNs use only convolution layers, no fully connected layers.
Output is a map with the same spatial size as input, but with class scores per pixel.
Examples
Example of smaller FCN layers for fewer classes.
Computer Vision
self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1) self.conv2 = nn.Conv2d(32, num_classes, kernel_size=1)
Shows how output keeps spatial dimensions for pixel-wise prediction.
Computer Vision
output = model(input_tensor)
# output shape: (batch_size, num_classes, height, width)Sample Model
This code builds a simple FCN with two convolution layers. It takes a small image and predicts class scores for each pixel. Then it converts scores to probabilities.
Computer Vision
import torch import torch.nn as nn import torch.nn.functional as F class FCN(nn.Module): def __init__(self, num_classes): super(FCN, self).__init__() self.conv1 = nn.Conv2d(3, 64, 3, padding=1) self.relu = nn.ReLU() self.conv2 = nn.Conv2d(64, num_classes, 1) def forward(self, x): x = self.relu(self.conv1(x)) x = self.conv2(x) return x # Create model for 2 classes model = FCN(num_classes=2) # Create dummy input: batch of 1 image, 3 channels, 4x4 pixels input_tensor = torch.randn(1, 3, 4, 4) # Forward pass output = model(input_tensor) # Check output shape print(f"Output shape: {output.shape}") # Apply softmax to get probabilities per pixel probabilities = F.softmax(output, dim=1) # Print probabilities for first pixel print(f"Probabilities at pixel (0,0): {probabilities[0, :, 0, 0]}")
OutputSuccess
Important Notes
FCNs keep the image size by using padding in convolutions.
Using 1x1 convolutions helps to classify each pixel independently.
Softmax is used to turn raw scores into probabilities for each class per pixel.
Summary
FCNs label every pixel in an image using only convolution layers.
They are useful for tasks like segmentation where location matters.
Output shape matches input image size but has class scores per pixel.