How to do semantic segmentation python in computer vision

Computer-visionHow-ToBeginner · 4 min read

Semantic Segmentation in Python for Computer Vision: How to Guide

To do semantic segmentation in Python for computer vision, use a deep learning model like U-Net or DeepLabV3 with frameworks such as TensorFlow or PyTorch. Prepare labeled images with pixel-wise class masks, train the model on this data, then predict pixel classes on new images.

📐

Syntax

Semantic segmentation involves building a model that takes an image as input and outputs a mask where each pixel is labeled with a class. The typical syntax pattern includes:

model = build_model(): Create the segmentation model architecture.
model.compile(): Set optimizer, loss, and metrics.
model.fit(train_images, train_masks): Train the model with images and their pixel masks.
predictions = model.predict(test_images): Get predicted masks for new images.

python

import tensorflow as tf
from tensorflow.keras import layers, models

def build_unet(input_shape):
    inputs = layers.Input(shape=input_shape)
    c1 = layers.Conv2D(16, 3, activation='relu', padding='same')(inputs)
    c1 = layers.Conv2D(16, 3, activation='relu', padding='same')(c1)
    p1 = layers.MaxPooling2D()(c1)

    c2 = layers.Conv2D(32, 3, activation='relu', padding='same')(p1)
    c2 = layers.Conv2D(32, 3, activation='relu', padding='same')(c2)
    p2 = layers.MaxPooling2D()(c2)

    c3 = layers.Conv2D(64, 3, activation='relu', padding='same')(p2)
    c3 = layers.Conv2D(64, 3, activation='relu', padding='same')(c3)

    u2 = layers.UpSampling2D()(c3)
    u2 = layers.concatenate([u2, c2])
    c4 = layers.Conv2D(32, 3, activation='relu', padding='same')(u2)
    c4 = layers.Conv2D(32, 3, activation='relu', padding='same')(c4)

    u1 = layers.UpSampling2D()(c4)
    u1 = layers.concatenate([u1, c1])
    c5 = layers.Conv2D(16, 3, activation='relu', padding='same')(u1)
    c5 = layers.Conv2D(16, 3, activation='relu', padding='same')(c5)

    outputs = layers.Conv2D(1, 1, activation='sigmoid')(c5)

    model = models.Model(inputs=[inputs], outputs=[outputs])
    return model

model = build_unet((128, 128, 3))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

💻

Example

This example shows how to train a simple U-Net model on dummy data for semantic segmentation. It creates random images and masks, trains the model for a few steps, and predicts a mask.

python

import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models

# Build U-Net model function (same as Syntax section)
def build_unet(input_shape):
    inputs = layers.Input(shape=input_shape)
    c1 = layers.Conv2D(16, 3, activation='relu', padding='same')(inputs)
    c1 = layers.Conv2D(16, 3, activation='relu', padding='same')(c1)
    p1 = layers.MaxPooling2D()(c1)

    c2 = layers.Conv2D(32, 3, activation='relu', padding='same')(p1)
    c2 = layers.Conv2D(32, 3, activation='relu', padding='same')(c2)
    p2 = layers.MaxPooling2D()(c2)

    c3 = layers.Conv2D(64, 3, activation='relu', padding='same')(p2)
    c3 = layers.Conv2D(64, 3, activation='relu', padding='same')(c3)

    u2 = layers.UpSampling2D()(c3)
    u2 = layers.concatenate([u2, c2])
    c4 = layers.Conv2D(32, 3, activation='relu', padding='same')(u2)
    c4 = layers.Conv2D(32, 3, activation='relu', padding='same')(c4)

    u1 = layers.UpSampling2D()(c4)
    u1 = layers.concatenate([u1, c1])
    c5 = layers.Conv2D(16, 3, activation='relu', padding='same')(u1)
    c5 = layers.Conv2D(16, 3, activation='relu', padding='same')(c5)

    outputs = layers.Conv2D(1, 1, activation='sigmoid')(c5)

    model = models.Model(inputs=[inputs], outputs=[outputs])
    return model

# Create dummy data
train_images = np.random.rand(10, 128, 128, 3).astype(np.float32)
train_masks = np.random.randint(0, 2, (10, 128, 128, 1)).astype(np.float32)

# Build and compile model
model = build_unet((128, 128, 3))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train model briefly
history = model.fit(train_images, train_masks, epochs=2, batch_size=2)

# Predict on new dummy image
test_image = np.random.rand(1, 128, 128, 3).astype(np.float32)
pred_mask = model.predict(test_image)
print('Predicted mask shape:', pred_mask.shape)

Output

Epoch 1/2 5/5 [==============================] - 3s 108ms/step - loss: 0.6931 - accuracy: 0.5000 Epoch 2/2 5/5 [==============================] - 0s 98ms/step - loss: 0.6929 - accuracy: 0.5000 1/1 [==============================] - 0s 59ms/step Predicted mask shape: (1, 128, 128, 1)

⚠️

Common Pitfalls

Common mistakes when doing semantic segmentation include:

Using incorrect mask shapes or data types (masks must match image size and be categorical or binary).
Not normalizing input images properly, causing poor training.
Choosing wrong loss function (use categorical_crossentropy for multi-class, binary_crossentropy for two classes).
Ignoring overfitting by not using validation data or augmentation.

python

import numpy as np

# Wrong mask shape example
wrong_mask = np.random.randint(0, 2, (10, 64, 64, 1))  # Smaller than image size 128x128

# Correct mask shape example
correct_mask = np.random.randint(0, 2, (10, 128, 128, 1))

print('Wrong mask shape:', wrong_mask.shape)
print('Correct mask shape:', correct_mask.shape)

Output

Wrong mask shape: (10, 64, 64, 1) Correct mask shape: (10, 128, 128, 1)

📊

Quick Reference

Model: Use U-Net or DeepLabV3 architectures.
Data: Images and pixel-wise labeled masks of same size.
Loss: Binary crossentropy for 2 classes, categorical crossentropy for multiple classes.
Training: Normalize images, batch data, use validation split.
Prediction: Model outputs mask with class probabilities per pixel.

✅

Key Takeaways

Semantic segmentation labels each pixel of an image with a class using deep learning models like U-Net.

Prepare your dataset with images and matching pixel-wise masks of the same size and format.

Use appropriate loss functions: binary crossentropy for two classes, categorical crossentropy for multiple classes.

Normalize input images and ensure mask shapes match images to avoid training errors.

Validate your model with separate data and consider augmentation to improve generalization.