Semantic Segmentation in Python for Computer Vision: How to Guide
To do
semantic segmentation in Python for computer vision, use a deep learning model like U-Net or DeepLabV3 with frameworks such as TensorFlow or PyTorch. Prepare labeled images with pixel-wise class masks, train the model on this data, then predict pixel classes on new images.Syntax
Semantic segmentation involves building a model that takes an image as input and outputs a mask where each pixel is labeled with a class. The typical syntax pattern includes:
model = build_model(): Create the segmentation model architecture.model.compile(): Set optimizer, loss, and metrics.model.fit(train_images, train_masks): Train the model with images and their pixel masks.predictions = model.predict(test_images): Get predicted masks for new images.
python
import tensorflow as tf from tensorflow.keras import layers, models def build_unet(input_shape): inputs = layers.Input(shape=input_shape) c1 = layers.Conv2D(16, 3, activation='relu', padding='same')(inputs) c1 = layers.Conv2D(16, 3, activation='relu', padding='same')(c1) p1 = layers.MaxPooling2D()(c1) c2 = layers.Conv2D(32, 3, activation='relu', padding='same')(p1) c2 = layers.Conv2D(32, 3, activation='relu', padding='same')(c2) p2 = layers.MaxPooling2D()(c2) c3 = layers.Conv2D(64, 3, activation='relu', padding='same')(p2) c3 = layers.Conv2D(64, 3, activation='relu', padding='same')(c3) u2 = layers.UpSampling2D()(c3) u2 = layers.concatenate([u2, c2]) c4 = layers.Conv2D(32, 3, activation='relu', padding='same')(u2) c4 = layers.Conv2D(32, 3, activation='relu', padding='same')(c4) u1 = layers.UpSampling2D()(c4) u1 = layers.concatenate([u1, c1]) c5 = layers.Conv2D(16, 3, activation='relu', padding='same')(u1) c5 = layers.Conv2D(16, 3, activation='relu', padding='same')(c5) outputs = layers.Conv2D(1, 1, activation='sigmoid')(c5) model = models.Model(inputs=[inputs], outputs=[outputs]) return model model = build_unet((128, 128, 3)) model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
Example
This example shows how to train a simple U-Net model on dummy data for semantic segmentation. It creates random images and masks, trains the model for a few steps, and predicts a mask.
python
import numpy as np import tensorflow as tf from tensorflow.keras import layers, models # Build U-Net model function (same as Syntax section) def build_unet(input_shape): inputs = layers.Input(shape=input_shape) c1 = layers.Conv2D(16, 3, activation='relu', padding='same')(inputs) c1 = layers.Conv2D(16, 3, activation='relu', padding='same')(c1) p1 = layers.MaxPooling2D()(c1) c2 = layers.Conv2D(32, 3, activation='relu', padding='same')(p1) c2 = layers.Conv2D(32, 3, activation='relu', padding='same')(c2) p2 = layers.MaxPooling2D()(c2) c3 = layers.Conv2D(64, 3, activation='relu', padding='same')(p2) c3 = layers.Conv2D(64, 3, activation='relu', padding='same')(c3) u2 = layers.UpSampling2D()(c3) u2 = layers.concatenate([u2, c2]) c4 = layers.Conv2D(32, 3, activation='relu', padding='same')(u2) c4 = layers.Conv2D(32, 3, activation='relu', padding='same')(c4) u1 = layers.UpSampling2D()(c4) u1 = layers.concatenate([u1, c1]) c5 = layers.Conv2D(16, 3, activation='relu', padding='same')(u1) c5 = layers.Conv2D(16, 3, activation='relu', padding='same')(c5) outputs = layers.Conv2D(1, 1, activation='sigmoid')(c5) model = models.Model(inputs=[inputs], outputs=[outputs]) return model # Create dummy data train_images = np.random.rand(10, 128, 128, 3).astype(np.float32) train_masks = np.random.randint(0, 2, (10, 128, 128, 1)).astype(np.float32) # Build and compile model model = build_unet((128, 128, 3)) model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) # Train model briefly history = model.fit(train_images, train_masks, epochs=2, batch_size=2) # Predict on new dummy image test_image = np.random.rand(1, 128, 128, 3).astype(np.float32) pred_mask = model.predict(test_image) print('Predicted mask shape:', pred_mask.shape)
Output
Epoch 1/2
5/5 [==============================] - 3s 108ms/step - loss: 0.6931 - accuracy: 0.5000
Epoch 2/2
5/5 [==============================] - 0s 98ms/step - loss: 0.6929 - accuracy: 0.5000
1/1 [==============================] - 0s 59ms/step
Predicted mask shape: (1, 128, 128, 1)
Common Pitfalls
Common mistakes when doing semantic segmentation include:
- Using incorrect mask shapes or data types (masks must match image size and be categorical or binary).
- Not normalizing input images properly, causing poor training.
- Choosing wrong loss function (use
categorical_crossentropyfor multi-class,binary_crossentropyfor two classes). - Ignoring overfitting by not using validation data or augmentation.
python
import numpy as np # Wrong mask shape example wrong_mask = np.random.randint(0, 2, (10, 64, 64, 1)) # Smaller than image size 128x128 # Correct mask shape example correct_mask = np.random.randint(0, 2, (10, 128, 128, 1)) print('Wrong mask shape:', wrong_mask.shape) print('Correct mask shape:', correct_mask.shape)
Output
Wrong mask shape: (10, 64, 64, 1)
Correct mask shape: (10, 128, 128, 1)
Quick Reference
- Model: Use U-Net or DeepLabV3 architectures.
- Data: Images and pixel-wise labeled masks of same size.
- Loss: Binary crossentropy for 2 classes, categorical crossentropy for multiple classes.
- Training: Normalize images, batch data, use validation split.
- Prediction: Model outputs mask with class probabilities per pixel.
Key Takeaways
Semantic segmentation labels each pixel of an image with a class using deep learning models like U-Net.
Prepare your dataset with images and matching pixel-wise masks of the same size and format.
Use appropriate loss functions: binary crossentropy for two classes, categorical crossentropy for multiple classes.
Normalize input images and ensure mask shapes match images to avoid training errors.
Validate your model with separate data and consider augmentation to improve generalization.