0
0
Computer Visionml~12 mins

FCN (Fully Convolutional Network) in Computer Vision - Model Pipeline Trace

Choose your learning style9 modes available
Model Pipeline - FCN (Fully Convolutional Network)

An FCN is a special type of neural network that looks at images and labels each pixel. It helps computers understand what parts of a picture belong to which object, like coloring each pixel as 'car' or 'road'.

Data Flow - 7 Stages
1Input Image
1 image x 256 height x 256 width x 3 channelsRaw image loaded for processing1 image x 256 height x 256 width x 3 channels
A 256x256 color photo with red, green, blue channels
2Convolutional Layers
1 x 256 x 256 x 3Apply filters to detect edges and shapes1 x 256 x 256 x 64
Feature maps highlighting edges and textures
3Pooling Layers
1 x 256 x 256 x 64Reduce image size to focus on important features1 x 128 x 128 x 64
Smaller feature maps summarizing regions
4More Convolutional Layers
1 x 128 x 128 x 64Extract deeper features1 x 128 x 128 x 128
Feature maps capturing complex shapes
5Upsampling Layers
1 x 128 x 128 x 128Increase spatial size to original image size1 x 256 x 256 x 128
Upscaled feature maps aligned with input size
6Final Convolution
1 x 256 x 256 x 128Produce pixel-wise class scores1 x 256 x 256 x 3
Scores for 3 classes per pixel (e.g., background, object1, object2)
7Softmax Activation
1 x 256 x 256 x 3Convert scores to probabilities per pixel1 x 256 x 256 x 3
Probabilities summing to 1 for each pixel
Training Trace - Epoch by Epoch
Loss
1.2 |****
0.9 |***
0.7 |**
0.55|*
0.45| 
     Epochs -> 1 2 3 4 5
EpochLoss ↓Accuracy ↑Observation
11.20.45Model starts learning basic features
20.90.60Improved pixel classification
30.70.72Model captures more details
40.550.80Better segmentation boundaries
50.450.85Model converging well
Prediction Trace - 7 Layers
Layer 1: Input Image
Layer 2: Convolutional Layer
Layer 3: Pooling Layer
Layer 4: More Convolutional Layers
Layer 5: Upsampling Layer
Layer 6: Final Convolution
Layer 7: Softmax Activation
Model Quiz - 3 Questions
Test your understanding
What is the main purpose of the upsampling layers in an FCN?
ATo apply activation functions like ReLU
BTo reduce the number of channels in the feature maps
CTo increase the spatial size of feature maps back to the input image size
DTo split the image into smaller patches
Key Insight
Fully Convolutional Networks learn to label each pixel by combining convolution, pooling, and upsampling layers. Training improves the model by lowering loss and raising accuracy, helping it understand image details better.