Computer Visionml~5 mins

FCN (Fully Convolutional Network) in Computer Vision

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Introduction

An FCN helps computers understand images by labeling each pixel. It turns images into meaningful maps, like coloring each part of a photo.

When you want to find objects in a photo and know exactly where they are.

When you need to separate different parts of an image, like roads and buildings on a map.

When you want to help a robot see and understand its surroundings.

When you want to analyze medical images to find areas of interest.

When you want to create special effects by changing parts of an image.

Syntax

Computer Vision

class FCN(nn.Module):
    def __init__(self, num_classes):
        super(FCN, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, padding=1)
        self.relu = nn.ReLU()
        self.conv2 = nn.Conv2d(64, num_classes, kernel_size=1)

    def forward(self, x):
        x = self.relu(self.conv1(x))
        x = self.conv2(x)
        return x

FCNs use only convolution layers, no fully connected layers.

Output is a map with the same spatial size as input, but with class scores per pixel.

Examples

Example of smaller FCN layers for fewer classes.

Computer Vision

self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
self.conv2 = nn.Conv2d(32, num_classes, kernel_size=1)

Shows how output keeps spatial dimensions for pixel-wise prediction.

Computer Vision

output = model(input_tensor)
# output shape: (batch_size, num_classes, height, width)

Sample Model

This code builds a simple FCN with two convolution layers. It takes a small image and predicts class scores for each pixel. Then it converts scores to probabilities.

Computer Vision

import torch
import torch.nn as nn
import torch.nn.functional as F

class FCN(nn.Module):
    def __init__(self, num_classes):
        super(FCN, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, 3, padding=1)
        self.relu = nn.ReLU()
        self.conv2 = nn.Conv2d(64, num_classes, 1)

    def forward(self, x):
        x = self.relu(self.conv1(x))
        x = self.conv2(x)
        return x

# Create model for 2 classes
model = FCN(num_classes=2)

# Create dummy input: batch of 1 image, 3 channels, 4x4 pixels
input_tensor = torch.randn(1, 3, 4, 4)

# Forward pass
output = model(input_tensor)

# Check output shape
print(f"Output shape: {output.shape}")

# Apply softmax to get probabilities per pixel
probabilities = F.softmax(output, dim=1)

# Print probabilities for first pixel
print(f"Probabilities at pixel (0,0): {probabilities[0, :, 0, 0]}")

OutputSuccess

Important Notes

FCNs keep the image size by using padding in convolutions.

Using 1x1 convolutions helps to classify each pixel independently.

Softmax is used to turn raw scores into probabilities for each class per pixel.

Summary

FCNs label every pixel in an image using only convolution layers.

They are useful for tasks like segmentation where location matters.

Output shape matches input image size but has class scores per pixel.