In a Variational Autoencoder (VAE), the loss function includes a KL divergence term. What role does this term play?
Think about how the latent space is shaped in a VAE.
The KL divergence term ensures the latent variables approximate a known prior distribution, enabling smooth sampling and generation.
Given the following PyTorch encoder code snippet, what is the shape of the latent vector z after sampling?
import torch import torch.nn as nn class Encoder(nn.Module): def __init__(self): super().__init__() self.fc1 = nn.Linear(784, 400) self.fc_mu = nn.Linear(400, 20) self.fc_logvar = nn.Linear(400, 20) def forward(self, x): h = torch.relu(self.fc1(x)) mu = self.fc_mu(h) logvar = self.fc_logvar(h) std = torch.exp(0.5 * logvar) eps = torch.randn_like(std) z = mu + eps * std return z encoder = Encoder() x = torch.randn(64, 784) z = encoder(x)
Check the output size of fc_mu and fc_logvar.
The latent vector z has the same batch size as input (64) and latent dimension size (20).
You want to build a decoder for a VAE that reconstructs 28x28 grayscale images (like MNIST). Which architecture is most appropriate?
Consider how spatial information is best preserved and reconstructed.
Transposed convolutions in a CNN decoder help reconstruct spatial structure in images better than fully connected layers alone.
After training a VAE on images, which combination of metrics best evaluates its performance?
Think about what the VAE tries to minimize during training.
MSE measures how close the reconstructed images are to the originals, while KL divergence ensures the latent space matches the prior.
Consider this PyTorch training loop snippet for a VAE. Why does it raise a runtime error?
for data in dataloader: optimizer.zero_grad() x = data.view(-1, 784) z = encoder(x) x_recon = decoder(z) recon_loss = nn.functional.binary_cross_entropy(x_recon, x) kl_loss = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp()) loss = recon_loss + kl_loss loss.backward() optimizer.step()
Check where mu and logvar come from in the encoder.
The variables mu and logvar are outputs of the encoder but are not returned or defined in this snippet, so referencing them causes an error.