0
0
Computer Visionml~5 mins

Autoencoder for images in Computer Vision

Choose your learning style9 modes available
Introduction

An autoencoder learns to copy images by compressing and then rebuilding them. It helps computers understand important parts of pictures without needing labels.

To reduce image size while keeping important details.
To remove noise from pictures by learning clean versions.
To find patterns in images without knowing their labels.
To prepare images for other tasks like classification or search.
Syntax
Computer Vision
Define encoder layers to shrink image data.
Define decoder layers to rebuild the image.
Train the model to minimize difference between input and output images.

The encoder compresses the image into a smaller form called latent space.

The decoder tries to recreate the original image from this compressed form.

Examples
This example shows a simple convolutional autoencoder structure for images.
Computer Vision
Encoder: Input image -> Conv2D layers -> Flatten -> Dense (latent space)
Decoder: Dense -> Reshape -> Conv2DTranspose layers -> Output image
This loss helps the model learn to rebuild images accurately.
Computer Vision
Use Mean Squared Error loss to measure how close output images are to inputs.
Sample Model

This code trains a simple convolutional autoencoder on handwritten digit images (MNIST). It compresses and then reconstructs images. The training and validation loss show how well the model learns. The output shape confirms the image size is preserved.

Computer Vision
import tensorflow as tf
from tensorflow.keras import layers, models

# Load and prepare MNIST dataset
(x_train, _), (x_test, _) = tf.keras.datasets.mnist.load_data()
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train[..., tf.newaxis]
x_test = x_test[..., tf.newaxis]

# Encoder
input_img = layers.Input(shape=(28, 28, 1))
x = layers.Conv2D(16, 3, activation='relu', padding='same')(input_img)
x = layers.MaxPooling2D(2, padding='same')(x)
x = layers.Conv2D(8, 3, activation='relu', padding='same')(x)
x = layers.MaxPooling2D(2, padding='same')(x)

# Latent space
encoded = layers.Conv2D(8, 3, activation='relu', padding='same')(x)

# Decoder
x = layers.Conv2DTranspose(8, 3, strides=2, activation='relu', padding='same')(encoded)
x = layers.Conv2DTranspose(16, 3, strides=2, activation='relu', padding='same')(x)

# Output layer
decoded = layers.Conv2D(1, 3, activation='sigmoid', padding='same')(x)

# Autoencoder model
autoencoder = models.Model(input_img, decoded)
autoencoder.compile(optimizer='adam', loss='mse')

# Train the model
history = autoencoder.fit(x_train, x_train, epochs=3, batch_size=128, validation_data=(x_test, x_test))

# Predict on test images
decoded_imgs = autoencoder.predict(x_test[:5])

# Show loss values
print(f"Final training loss: {history.history['loss'][-1]:.4f}")
print(f"Final validation loss: {history.history['val_loss'][-1]:.4f}")

# Show first prediction shape
print(f"Shape of first decoded image: {decoded_imgs[0].shape}")
OutputSuccess
Important Notes

Autoencoders learn without needing labels, making them useful for many image tasks.

Training longer or with more data usually improves image reconstruction quality.

Using convolution layers helps the model understand image patterns better than simple dense layers.

Summary

Autoencoders compress and rebuild images to learn important features.

They are useful for noise removal, compression, and pattern discovery.

Convolutional autoencoders work well for image data by capturing spatial details.