For Variational Autoencoders (VAEs), the key metric is the Evidence Lower Bound (ELBO). ELBO combines two parts: the reconstruction loss and the Kullback-Leibler (KL) divergence.
The reconstruction loss measures how well the model can recreate the input data from its compressed form. The KL divergence measures how close the learned latent space is to a prior distribution, typically a standard normal distribution.
We want to minimize the total ELBO loss because it means the model is good at compressing and generating data that looks like the original.