Computer-visionHow-ToBeginner · 4 min read

How to Use U-Net for Segmentation in Computer Vision

Use U-Net by building its encoder-decoder architecture to segment images pixel-wise. Train the model on labeled images with masks using a loss like binary_crossentropy or dice_loss, then predict segmentation masks on new images.

📐

Syntax

The U-Net model consists of an encoder (downsampling path) and a decoder (upsampling path) connected by skip connections. The encoder extracts features, and the decoder reconstructs the segmentation mask.

Key parts include:

Conv2D: applies filters to extract features.
MaxPooling2D: reduces spatial size to capture context.
UpSampling2D: increases spatial size to restore resolution.
Concatenate: merges encoder features with decoder features for precise localization.
Activation: usually sigmoid for binary segmentation output.

python

from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D, Concatenate
from tensorflow.keras.models import Model

def unet(input_shape=(128, 128, 1)):
    inputs = Input(input_shape)
    # Encoder
    c1 = Conv2D(16, 3, activation='relu', padding='same')(inputs)
    c1 = Conv2D(16, 3, activation='relu', padding='same')(c1)
    p1 = MaxPooling2D()(c1)

    c2 = Conv2D(32, 3, activation='relu', padding='same')(p1)
    c2 = Conv2D(32, 3, activation='relu', padding='same')(c2)
    p2 = MaxPooling2D()(c2)

    # Bottleneck
    c3 = Conv2D(64, 3, activation='relu', padding='same')(p2)
    c3 = Conv2D(64, 3, activation='relu', padding='same')(c3)

    # Decoder
    u1 = UpSampling2D()(c3)
    u1 = Concatenate()([u1, c2])
    c4 = Conv2D(32, 3, activation='relu', padding='same')(u1)
    c4 = Conv2D(32, 3, activation='relu', padding='same')(c4)

    u2 = UpSampling2D()(c4)
    u2 = Concatenate()([u2, c1])
    c5 = Conv2D(16, 3, activation='relu', padding='same')(u2)
    c5 = Conv2D(16, 3, activation='relu', padding='same')(c5)

    outputs = Conv2D(1, 1, activation='sigmoid')(c5)

    model = Model(inputs, outputs)
    return model

💻

Example

This example shows how to create a U-Net model, compile it with a loss function and optimizer, train on dummy data, and predict segmentation masks.

python

import numpy as np
from tensorflow.keras.optimizers import Adam

# Create U-Net model
model = unet(input_shape=(128, 128, 1))
model.compile(optimizer=Adam(), loss='binary_crossentropy', metrics=['accuracy'])

# Generate dummy data: 10 grayscale images and masks
x_train = np.random.rand(10, 128, 128, 1).astype(np.float32)
y_train = (np.random.rand(10, 128, 128, 1) > 0.5).astype(np.float32)

# Train model briefly
history = model.fit(x_train, y_train, epochs=2, batch_size=2)

# Predict on new dummy data
x_test = np.random.rand(2, 128, 128, 1).astype(np.float32)
predictions = model.predict(x_test)

print('Predictions shape:', predictions.shape)
print('Sample prediction pixel value:', predictions[0, 64, 64, 0])

Output

Epoch 1/2 5/5 [==============================] - 3s 156ms/step - loss: 0.6931 - accuracy: 0.5000 Epoch 2/2 5/5 [==============================] - 1s 156ms/step - loss: 0.6929 - accuracy: 0.5000 1/1 [==============================] - 0s 28ms/step Predictions shape: (2, 128, 128, 1) Sample prediction pixel value: 0.4999995231628418

⚠️

Common Pitfalls

Incorrect input shape: U-Net expects 3D input (height, width, channels). Missing channel dimension causes errors.
Improper loss function: Use binary_crossentropy for binary masks or specialized losses like Dice loss for better segmentation.
Skipping skip connections: Not concatenating encoder features to decoder reduces segmentation accuracy.
Overfitting: U-Net can overfit small datasets; use data augmentation or regularization.

python

from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D
from tensorflow.keras.models import Model

# Wrong: Missing skip connections
inputs = Input((128, 128, 1))
c1 = Conv2D(16, 3, activation='relu', padding='same')(inputs)
p1 = MaxPooling2D()(c1)
c2 = Conv2D(32, 3, activation='relu', padding='same')(p1)
c3 = Conv2D(64, 3, activation='relu', padding='same')(c2)
u1 = UpSampling2D()(c3)
# Missing concatenate here
c4 = Conv2D(32, 3, activation='relu', padding='same')(u1)
u2 = UpSampling2D()(c4)
c5 = Conv2D(16, 3, activation='relu', padding='same')(u2)
outputs = Conv2D(1, 1, activation='sigmoid')(c5)
model_wrong = Model(inputs, outputs)

# Right: Include skip connections
from tensorflow.keras.layers import Concatenate
u1 = UpSampling2D()(c3)
u1 = Concatenate()([u1, c2])
c4 = Conv2D(32, 3, activation='relu', padding='same')(u1)
u2 = UpSampling2D()(c4)
u2 = Concatenate()([u2, c1])
c5 = Conv2D(16, 3, activation='relu', padding='same')(u2)
outputs = Conv2D(1, 1, activation='sigmoid')(c5)
model_right = Model(inputs, outputs)

📊

Quick Reference

Key tips for using U-Net:

Input shape: (height, width, channels), usually grayscale or RGB.
Use binary_crossentropy or Dice loss for training.
Keep skip connections to preserve spatial details.
Train with enough data or augment to avoid overfitting.
Output activation: sigmoid for binary segmentation.

✅

Key Takeaways

U-Net segments images by combining encoder and decoder paths with skip connections.

Use correct input shapes and loss functions like binary crossentropy for training.

Skip connections are essential for accurate segmentation results.

Avoid overfitting by using enough data or data augmentation.

Output layer uses sigmoid activation for binary mask prediction.