0
0
Computer Visionml~20 mins

Variational Autoencoder in Computer Vision - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Variational Autoencoder Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
What is the main purpose of the KL divergence term in a Variational Autoencoder?

In a Variational Autoencoder (VAE), the loss function includes a KL divergence term. What role does this term play?

AIt measures the reconstruction error between input and output images.
BIt increases the complexity of the decoder network to improve output quality.
CIt forces the encoded latent variables to follow a prior distribution, usually a standard normal distribution.
DIt acts as a regularizer to prevent overfitting by dropping neurons randomly.
Attempts:
2 left
💡 Hint

Think about how the latent space is shaped in a VAE.

Predict Output
intermediate
2:00remaining
What is the output shape of the latent vector z in this VAE encoder snippet?

Given the following PyTorch encoder code snippet, what is the shape of the latent vector z after sampling?

Computer Vision
import torch
import torch.nn as nn

class Encoder(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 400)
        self.fc_mu = nn.Linear(400, 20)
        self.fc_logvar = nn.Linear(400, 20)

    def forward(self, x):
        h = torch.relu(self.fc1(x))
        mu = self.fc_mu(h)
        logvar = self.fc_logvar(h)
        std = torch.exp(0.5 * logvar)
        eps = torch.randn_like(std)
        z = mu + eps * std
        return z

encoder = Encoder()
x = torch.randn(64, 784)
z = encoder(x)
A(64, 20)
B(20, 64)
C(64, 400)
D(400, 20)
Attempts:
2 left
💡 Hint

Check the output size of fc_mu and fc_logvar.

Model Choice
advanced
2:00remaining
Which decoder architecture is best suited for reconstructing 28x28 grayscale images in a VAE?

You want to build a decoder for a VAE that reconstructs 28x28 grayscale images (like MNIST). Which architecture is most appropriate?

AA convolutional neural network with transposed convolutions to upsample from latent space to 28x28.
BA recurrent neural network that generates pixels sequentially.
CA fully connected network that outputs a vector of size 784, reshaped to 28x28.
DA simple linear layer that outputs a single scalar value.
Attempts:
2 left
💡 Hint

Consider how spatial information is best preserved and reconstructed.

Metrics
advanced
2:00remaining
Which metric combination correctly evaluates a trained VAE on image reconstruction?

After training a VAE on images, which combination of metrics best evaluates its performance?

AF1 score for reconstruction and BLEU score for latent space.
BMean Squared Error (MSE) for reconstruction quality and KL divergence for latent distribution regularization.
CPrecision and Recall for latent space clustering.
DAccuracy for classification and Cross-Entropy loss for reconstruction.
Attempts:
2 left
💡 Hint

Think about what the VAE tries to minimize during training.

🔧 Debug
expert
2:00remaining
Why does this VAE training code raise a runtime error?

Consider this PyTorch training loop snippet for a VAE. Why does it raise a runtime error?

Computer Vision
for data in dataloader:
    optimizer.zero_grad()
    x = data.view(-1, 784)
    z = encoder(x)
    x_recon = decoder(z)
    recon_loss = nn.functional.binary_cross_entropy(x_recon, x)
    kl_loss = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
    loss = recon_loss + kl_loss
    loss.backward()
    optimizer.step()
AThe decoder output shape does not match the input shape, causing a shape mismatch error.
BThe input x is not normalized, causing the binary cross entropy to fail.
CThe optimizer is not zeroed before backward pass, causing gradient accumulation errors.
DVariables mu and logvar are not defined in the training loop, causing a NameError.
Attempts:
2 left
💡 Hint

Check where mu and logvar come from in the encoder.