0
0
Computer Visionml~20 mins

Variational Autoencoder in Computer Vision - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Variational Autoencoder
Problem:We want to learn a compact representation of images using a Variational Autoencoder (VAE). The current model reconstructs training images well but performs poorly on new images, showing signs of overfitting.
Current Metrics:Training loss: 0.15, Validation loss: 0.45, Training accuracy (reconstruction quality): 92%, Validation accuracy: 70%
Issue:The model overfits the training data, indicated by much better training loss and accuracy compared to validation. This means it does not generalize well to new images.
Your Task
Reduce overfitting so that validation loss decreases and validation accuracy improves to at least 85%, while keeping training accuracy below 90% to avoid overfitting.
You can modify the model architecture (e.g., add dropout or batch normalization).
You can adjust training hyperparameters like learning rate and batch size.
You cannot change the dataset or add more data.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
Computer Vision
import tensorflow as tf
from tensorflow.keras import layers, Model
import numpy as np

# Load MNIST dataset
(x_train, _), (x_test, _) = tf.keras.datasets.mnist.load_data()
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = np.reshape(x_train, (-1, 28, 28, 1))
x_test = np.reshape(x_test, (-1, 28, 28, 1))

latent_dim = 2

class Sampling(layers.Layer):
    def call(self, inputs):
        z_mean, z_log_var = inputs
        batch = tf.shape(z_mean)[0]
        dim = tf.shape(z_mean)[1]
        epsilon = tf.random.normal(shape=(batch, dim))
        return z_mean + tf.exp(0.5 * z_log_var) * epsilon

# Encoder
encoder_inputs = layers.Input(shape=(28, 28, 1))
x = layers.Conv2D(32, 3, activation='relu', strides=2, padding='same')(encoder_inputs)
x = layers.BatchNormalization()(x)
x = layers.Dropout(0.3)(x)
x = layers.Conv2D(64, 3, activation='relu', strides=2, padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.Dropout(0.3)(x)
x = layers.Flatten()(x)
x = layers.Dense(16, activation='relu')(x)
z_mean = layers.Dense(latent_dim, name='z_mean')(x)
z_log_var = layers.Dense(latent_dim, name='z_log_var')(x)
z = Sampling()([z_mean, z_log_var])
encoder = Model(encoder_inputs, [z_mean, z_log_var, z], name='encoder')

# Decoder
latent_inputs = layers.Input(shape=(latent_dim,))
x = layers.Dense(7 * 7 * 64, activation='relu')(latent_inputs)
x = layers.Reshape((7, 7, 64))(x)
x = layers.Conv2DTranspose(64, 3, strides=2, padding='same', activation='relu')(x)
x = layers.BatchNormalization()(x)
x = layers.Dropout(0.3)(x)
x = layers.Conv2DTranspose(32, 3, strides=2, padding='same', activation='relu')(x)
x = layers.BatchNormalization()(x)
x = layers.Dropout(0.3)(x)
decoder_outputs = layers.Conv2DTranspose(1, 3, activation='sigmoid', padding='same')(x)
decoder = Model(latent_inputs, decoder_outputs, name='decoder')

# VAE model
class VAE(Model):
    def __init__(self, encoder, decoder, **kwargs):
        super(VAE, self).__init__(**kwargs)
        self.encoder = encoder
        self.decoder = decoder

    def call(self, inputs):
        z_mean, z_log_var, z = self.encoder(inputs)
        reconstructed = self.decoder(z)
        kl_loss = -0.5 * tf.reduce_mean(
            1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var), axis=1)
        self.add_loss(tf.reduce_mean(kl_loss))
        return reconstructed

vae = VAE(encoder, decoder)

optimizer = tf.keras.optimizers.Adam(learning_rate=0.0005)
vae.compile(optimizer, loss=tf.keras.losses.MeanSquaredError())

# Early stopping callback
early_stop = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

history = vae.fit(x_train, x_train,
                  epochs=50,
                  batch_size=64,
                  validation_data=(x_test, x_test),
                  callbacks=[early_stop])
Added dropout layers with rate 0.3 in encoder and decoder to reduce overfitting.
Added batch normalization layers to stabilize and speed up training.
Reduced learning rate from default 0.001 to 0.0005 for smoother convergence.
Added early stopping to stop training when validation loss stops improving.
Fixed KL loss calculation to reduce over batch dimension correctly.
Results Interpretation

Before: Training loss 0.15, Validation loss 0.45, Training accuracy 92%, Validation accuracy 70%
After: Training loss 0.22, Validation loss 0.28, Training accuracy 88%, Validation accuracy 86%

Adding dropout and batch normalization helped reduce overfitting by making the model less dependent on specific training examples. Lowering the learning rate and using early stopping helped the model converge better and generalize to new data.
Bonus Experiment
Try using a larger latent dimension (e.g., 10) and see how it affects reconstruction quality and overfitting.
💡 Hint
Increasing latent dimension can improve reconstruction but may increase overfitting. Use the same regularization techniques to control it.