You want to build an autoencoder to compress and reconstruct 28x28 grayscale images. Which model architecture is best suited for this task?
Think about which model type can capture spatial patterns in images effectively.
Convolutional autoencoders use convolutional layers to capture spatial features in images, making them better for image compression and reconstruction than fully connected or sequential models.
Given the following encoder code for 28x28 grayscale images, what is the shape of the encoded output?
import torch import torch.nn as nn class Encoder(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(1, 16, 3, stride=2, padding=1) # output: 16 x 14 x 14 self.conv2 = nn.Conv2d(16, 8, 3, stride=2, padding=1) # output: 8 x 7 x 7 def forward(self, x): x = torch.relu(self.conv1(x)) x = torch.relu(self.conv2(x)) return x encoder = Encoder() input_tensor = torch.randn(1, 1, 28, 28) encoded = encoder(input_tensor) print(encoded.shape)
Calculate output size after each convolution using stride and padding.
First conv reduces 28x28 to 14x14 with 16 channels, second conv reduces 14x14 to 7x7 with 8 channels, so output shape is [1, 8, 7, 7].
You are training an autoencoder on 64x64 RGB images. What is the effect of choosing a very small latent dimension size?
Think about what happens if the compressed representation is too small to hold enough information.
A very small latent space forces the model to lose important details, leading to blurry or inaccurate reconstructions.
Which metric is best to measure how well an autoencoder reconstructs images?
Consider a metric that measures pixel-wise difference between images.
MSE measures the average squared difference between original and reconstructed pixels, making it suitable for image reconstruction quality.
You trained an autoencoder on 32x32 grayscale images but the loss does not decrease and reconstructions are random noise. What is the most likely cause?
Think about what causes training loss to not improve and outputs to be random.
A very high learning rate can cause the model weights to update too aggressively, preventing loss from decreasing and producing noisy outputs.