How to Use U-Net for Segmentation in Computer Vision
Use
U-Net by building its encoder-decoder architecture to segment images pixel-wise. Train the model on labeled images with masks using a loss like binary_crossentropy or dice_loss, then predict segmentation masks on new images.Syntax
The U-Net model consists of an encoder (downsampling path) and a decoder (upsampling path) connected by skip connections. The encoder extracts features, and the decoder reconstructs the segmentation mask.
Key parts include:
Conv2D: applies filters to extract features.MaxPooling2D: reduces spatial size to capture context.UpSampling2D: increases spatial size to restore resolution.Concatenate: merges encoder features with decoder features for precise localization.Activation: usuallysigmoidfor binary segmentation output.
python
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D, Concatenate from tensorflow.keras.models import Model def unet(input_shape=(128, 128, 1)): inputs = Input(input_shape) # Encoder c1 = Conv2D(16, 3, activation='relu', padding='same')(inputs) c1 = Conv2D(16, 3, activation='relu', padding='same')(c1) p1 = MaxPooling2D()(c1) c2 = Conv2D(32, 3, activation='relu', padding='same')(p1) c2 = Conv2D(32, 3, activation='relu', padding='same')(c2) p2 = MaxPooling2D()(c2) # Bottleneck c3 = Conv2D(64, 3, activation='relu', padding='same')(p2) c3 = Conv2D(64, 3, activation='relu', padding='same')(c3) # Decoder u1 = UpSampling2D()(c3) u1 = Concatenate()([u1, c2]) c4 = Conv2D(32, 3, activation='relu', padding='same')(u1) c4 = Conv2D(32, 3, activation='relu', padding='same')(c4) u2 = UpSampling2D()(c4) u2 = Concatenate()([u2, c1]) c5 = Conv2D(16, 3, activation='relu', padding='same')(u2) c5 = Conv2D(16, 3, activation='relu', padding='same')(c5) outputs = Conv2D(1, 1, activation='sigmoid')(c5) model = Model(inputs, outputs) return model
Example
This example shows how to create a U-Net model, compile it with a loss function and optimizer, train on dummy data, and predict segmentation masks.
python
import numpy as np from tensorflow.keras.optimizers import Adam # Create U-Net model model = unet(input_shape=(128, 128, 1)) model.compile(optimizer=Adam(), loss='binary_crossentropy', metrics=['accuracy']) # Generate dummy data: 10 grayscale images and masks x_train = np.random.rand(10, 128, 128, 1).astype(np.float32) y_train = (np.random.rand(10, 128, 128, 1) > 0.5).astype(np.float32) # Train model briefly history = model.fit(x_train, y_train, epochs=2, batch_size=2) # Predict on new dummy data x_test = np.random.rand(2, 128, 128, 1).astype(np.float32) predictions = model.predict(x_test) print('Predictions shape:', predictions.shape) print('Sample prediction pixel value:', predictions[0, 64, 64, 0])
Output
Epoch 1/2
5/5 [==============================] - 3s 156ms/step - loss: 0.6931 - accuracy: 0.5000
Epoch 2/2
5/5 [==============================] - 1s 156ms/step - loss: 0.6929 - accuracy: 0.5000
1/1 [==============================] - 0s 28ms/step
Predictions shape: (2, 128, 128, 1)
Sample prediction pixel value: 0.4999995231628418
Common Pitfalls
- Incorrect input shape: U-Net expects 3D input (height, width, channels). Missing channel dimension causes errors.
- Improper loss function: Use
binary_crossentropyfor binary masks or specialized losses like Dice loss for better segmentation. - Skipping skip connections: Not concatenating encoder features to decoder reduces segmentation accuracy.
- Overfitting: U-Net can overfit small datasets; use data augmentation or regularization.
python
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D from tensorflow.keras.models import Model # Wrong: Missing skip connections inputs = Input((128, 128, 1)) c1 = Conv2D(16, 3, activation='relu', padding='same')(inputs) p1 = MaxPooling2D()(c1) c2 = Conv2D(32, 3, activation='relu', padding='same')(p1) c3 = Conv2D(64, 3, activation='relu', padding='same')(c2) u1 = UpSampling2D()(c3) # Missing concatenate here c4 = Conv2D(32, 3, activation='relu', padding='same')(u1) u2 = UpSampling2D()(c4) c5 = Conv2D(16, 3, activation='relu', padding='same')(u2) outputs = Conv2D(1, 1, activation='sigmoid')(c5) model_wrong = Model(inputs, outputs) # Right: Include skip connections from tensorflow.keras.layers import Concatenate u1 = UpSampling2D()(c3) u1 = Concatenate()([u1, c2]) c4 = Conv2D(32, 3, activation='relu', padding='same')(u1) u2 = UpSampling2D()(c4) u2 = Concatenate()([u2, c1]) c5 = Conv2D(16, 3, activation='relu', padding='same')(u2) outputs = Conv2D(1, 1, activation='sigmoid')(c5) model_right = Model(inputs, outputs)
Quick Reference
Key tips for using U-Net:
- Input shape: (height, width, channels), usually grayscale or RGB.
- Use
binary_crossentropyor Dice loss for training. - Keep skip connections to preserve spatial details.
- Train with enough data or augment to avoid overfitting.
- Output activation:
sigmoidfor binary segmentation.
Key Takeaways
U-Net segments images by combining encoder and decoder paths with skip connections.
Use correct input shapes and loss functions like binary crossentropy for training.
Skip connections are essential for accurate segmentation results.
Avoid overfitting by using enough data or data augmentation.
Output layer uses sigmoid activation for binary mask prediction.