PyTorchml~8 mins

Variational Autoencoder in PyTorch - Model Metrics & Evaluation

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Metrics & Evaluation - Variational Autoencoder

Which metric matters for Variational Autoencoder and WHY

For Variational Autoencoders (VAEs), the key metric is the Evidence Lower Bound (ELBO). ELBO combines two parts: the reconstruction loss and the Kullback-Leibler (KL) divergence.

The reconstruction loss measures how well the model can recreate the input data from its compressed form. The KL divergence measures how close the learned latent space is to a prior distribution, typically a standard normal distribution.

We want to minimize the total ELBO loss because it means the model is good at compressing and generating data that looks like the original.

Confusion matrix or equivalent visualization

VAEs are unsupervised models, so they don't use confusion matrices like classifiers.

Instead, we look at the loss curve during training:

Epoch | Reconstruction Loss | KL Divergence | Total ELBO Loss
-----------------------------------------------------------
  1   |       150.3         |     12.5      |     162.8
  2   |       120.7         |     10.2      |     130.9
  3   |       110.1         |      9.8      |     119.9
  ...

Lower total ELBO loss over epochs means the model is learning better.

Precision vs Recall tradeoff (or equivalent) with concrete examples

VAEs balance two goals:

Reconstruction quality: How close the output is to the input.
Latent space regularization: How well the latent space follows a prior distribution, usually a normal distribution.

If we focus too much on reconstruction, the latent space may not be smooth or useful for generating new data.

If we focus too much on the KL divergence, the model may produce blurry or poor reconstructions.

Example: In image generation, a good VAE produces images that look like real digits (MNIST) but also can create new digits by sampling the latent space.

What "good" vs "bad" metric values look like for Variational Autoencoder

Good:

ELBO loss steadily decreases during training.
Reconstruction loss is low, meaning outputs look like inputs.
KL divergence is not zero but balanced, indicating a meaningful latent space.
Generated samples from latent space look realistic and diverse.

Bad:

ELBO loss stays high or fluctuates wildly.
Reconstruction loss is very high, outputs are blurry or wrong.
KL divergence is near zero (posterior collapse), meaning latent space is ignored.
Generated samples look random or all the same.

Metrics pitfalls

Posterior collapse: KL divergence goes to zero, model ignores latent space, reconstruction may still be okay but generation fails.
Overfitting: Very low reconstruction loss on training but poor generation or high loss on new data.
Ignoring KL term: If KL weight is too low, latent space won't be regularized, hurting generation quality.
Misinterpreting loss: ELBO loss is negative log likelihood plus KL, so lower is better, but absolute values depend on data scale.

Self-check question

Your VAE model has a reconstruction loss of 100 and KL divergence of 0.1 after training. Is this good?

Answer: The KL divergence is very low, close to zero, which suggests the model might be ignoring the latent space (posterior collapse). Even if reconstruction loss looks okay, the model may not learn a useful latent representation. You should try increasing the KL weight or adjusting the model to fix this.

Key Result

ELBO loss (reconstruction + KL divergence) is the key metric to evaluate Variational Autoencoders.