An autoencoder learns to copy images by compressing and then rebuilding them. It helps computers understand important parts of pictures without needing labels.
Autoencoder for images in Computer Vision
Define encoder layers to shrink image data. Define decoder layers to rebuild the image. Train the model to minimize difference between input and output images.
The encoder compresses the image into a smaller form called latent space.
The decoder tries to recreate the original image from this compressed form.
Encoder: Input image -> Conv2D layers -> Flatten -> Dense (latent space) Decoder: Dense -> Reshape -> Conv2DTranspose layers -> Output image
Use Mean Squared Error loss to measure how close output images are to inputs.
This code trains a simple convolutional autoencoder on handwritten digit images (MNIST). It compresses and then reconstructs images. The training and validation loss show how well the model learns. The output shape confirms the image size is preserved.
import tensorflow as tf from tensorflow.keras import layers, models # Load and prepare MNIST dataset (x_train, _), (x_test, _) = tf.keras.datasets.mnist.load_data() x_train = x_train.astype('float32') / 255. x_test = x_test.astype('float32') / 255. x_train = x_train[..., tf.newaxis] x_test = x_test[..., tf.newaxis] # Encoder input_img = layers.Input(shape=(28, 28, 1)) x = layers.Conv2D(16, 3, activation='relu', padding='same')(input_img) x = layers.MaxPooling2D(2, padding='same')(x) x = layers.Conv2D(8, 3, activation='relu', padding='same')(x) x = layers.MaxPooling2D(2, padding='same')(x) # Latent space encoded = layers.Conv2D(8, 3, activation='relu', padding='same')(x) # Decoder x = layers.Conv2DTranspose(8, 3, strides=2, activation='relu', padding='same')(encoded) x = layers.Conv2DTranspose(16, 3, strides=2, activation='relu', padding='same')(x) # Output layer decoded = layers.Conv2D(1, 3, activation='sigmoid', padding='same')(x) # Autoencoder model autoencoder = models.Model(input_img, decoded) autoencoder.compile(optimizer='adam', loss='mse') # Train the model history = autoencoder.fit(x_train, x_train, epochs=3, batch_size=128, validation_data=(x_test, x_test)) # Predict on test images decoded_imgs = autoencoder.predict(x_test[:5]) # Show loss values print(f"Final training loss: {history.history['loss'][-1]:.4f}") print(f"Final validation loss: {history.history['val_loss'][-1]:.4f}") # Show first prediction shape print(f"Shape of first decoded image: {decoded_imgs[0].shape}")
Autoencoders learn without needing labels, making them useful for many image tasks.
Training longer or with more data usually improves image reconstruction quality.
Using convolution layers helps the model understand image patterns better than simple dense layers.
Autoencoders compress and rebuild images to learn important features.
They are useful for noise removal, compression, and pattern discovery.
Convolutional autoencoders work well for image data by capturing spatial details.