0
0
Computer Visionml~5 mins

U-Net architecture in Computer Vision

Choose your learning style9 modes available
Introduction
U-Net helps computers find and separate objects in pictures by learning from examples. It is great for tasks like medical image analysis where precise shapes matter.
When you want to identify exact areas of objects in images, like tumors in medical scans.
When you need to separate different parts of a photo, such as roads and buildings in satellite images.
When you want to color or label each pixel in an image for detailed understanding.
When you have limited training data but need accurate image segmentation.
When you want a model that can learn both the big picture and fine details in images.
Syntax
Computer Vision
def unet(input_shape):
    inputs = Input(input_shape)
    # Encoder: Downsampling path
    c1 = Conv2D(64, 3, activation='relu', padding='same')(inputs)
    c1 = Conv2D(64, 3, activation='relu', padding='same')(c1)
    p1 = MaxPooling2D()(c1)

    c2 = Conv2D(128, 3, activation='relu', padding='same')(p1)
    c2 = Conv2D(128, 3, activation='relu', padding='same')(c2)
    p2 = MaxPooling2D()(c2)

    # Bottleneck
    c5 = Conv2D(1024, 3, activation='relu', padding='same')(p2)
    c5 = Conv2D(1024, 3, activation='relu', padding='same')(c5)

    # Decoder: Upsampling path
    u6 = Conv2DTranspose(512, 2, strides=2, padding='same')(c5)
    u6 = concatenate([u6, c2])
    c6 = Conv2D(512, 3, activation='relu', padding='same')(u6)
    c6 = Conv2D(512, 3, activation='relu', padding='same')(c6)

    u7 = Conv2DTranspose(256, 2, strides=2, padding='same')(c6)
    u7 = concatenate([u7, c1])
    c7 = Conv2D(256, 3, activation='relu', padding='same')(u7)
    c7 = Conv2D(256, 3, activation='relu', padding='same')(c7)

    outputs = Conv2D(1, 1, activation='sigmoid')(c7)
    model = Model(inputs, outputs)
    return model
U-Net has two main parts: encoder (downsampling) and decoder (upsampling).
Skip connections copy features from encoder to decoder to keep details.
Examples
Create a U-Net model for 128x128 grayscale images and show its layers.
Computer Vision
inputs = Input((128, 128, 1))
model = unet((128, 128, 1))
model.summary()
Change output layer to predict 3 classes instead of 1 for multi-class segmentation.
Computer Vision
outputs = Conv2D(3, 1, activation='softmax')(c7)
Sample Model
This code builds a small U-Net, trains it on random data for 2 rounds, and shows loss, accuracy, and prediction shape.
Computer Vision
import numpy as np
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Conv2DTranspose, concatenate
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam

# Define U-Net model
def unet(input_shape):
    inputs = Input(input_shape)
    c1 = Conv2D(16, 3, activation='relu', padding='same')(inputs)
    c1 = Conv2D(16, 3, activation='relu', padding='same')(c1)
    p1 = MaxPooling2D()(c1)

    c2 = Conv2D(32, 3, activation='relu', padding='same')(p1)
    c2 = Conv2D(32, 3, activation='relu', padding='same')(c2)
    p2 = MaxPooling2D()(c2)

    c3 = Conv2D(64, 3, activation='relu', padding='same')(p2)
    c3 = Conv2D(64, 3, activation='relu', padding='same')(c3)

    u4 = Conv2DTranspose(32, 2, strides=2, padding='same')(c3)
    u4 = concatenate([u4, c2])
    c4 = Conv2D(32, 3, activation='relu', padding='same')(u4)
    c4 = Conv2D(32, 3, activation='relu', padding='same')(c4)

    u5 = Conv2DTranspose(16, 2, strides=2, padding='same')(c4)
    u5 = concatenate([u5, c1])
    c5 = Conv2D(16, 3, activation='relu', padding='same')(u5)
    c5 = Conv2D(16, 3, activation='relu', padding='same')(c5)

    outputs = Conv2D(1, 1, activation='sigmoid')(c5)
    model = Model(inputs, outputs)
    return model

# Create model
model = unet((64, 64, 1))
model.compile(optimizer=Adam(), loss='binary_crossentropy', metrics=['accuracy'])

# Create dummy data: 10 images 64x64 with 1 channel
x_train = np.random.rand(10, 64, 64, 1).astype(np.float32)
y_train = (x_train > 0.5).astype(np.float32)  # dummy masks

# Train model for 2 epochs
history = model.fit(x_train, y_train, epochs=2, batch_size=2, verbose=0)

# Predict on one image
pred = model.predict(x_train[:1])
print(f"Loss after 2 epochs: {history.history['loss'][-1]:.4f}")
print(f"Accuracy after 2 epochs: {history.history['accuracy'][-1]:.4f}")
print(f"Prediction shape: {pred.shape}")
OutputSuccess
Important Notes
U-Net works well even with small datasets because of its design.
Skip connections help keep image details during upsampling.
Output activation is usually sigmoid for binary masks or softmax for multiple classes.
Summary
U-Net is a special model for finding exact shapes in images.
It has two parts: one shrinks the image to learn features, the other grows it back to original size.
Skip connections link these parts to keep details and improve accuracy.