Variational Autoencoders (VAEs)

← Back to Glossary

Variational Autoencoders (VAEs)

Variational Autoencoders (VAEs) are a type of generative model that leverage deep learning techniques to learn a probabilistic representation of data. VAEs are particularly useful for tasks such as unsupervised learning, dimensionality reduction, and generating new samples from the learned data distribution. In this glossary entry, we will discuss the key concepts and components of VAEs, including the encoder, decoder, latent space, and the variational inference technique used for training.

Encoder

The encoder is a neural network that takes an input data point and maps it to a lower-dimensional latent space. The encoder learns to approximate the posterior distribution of the latent variables given the input data, denoted as q(z|x). In VAEs, the encoder outputs two sets of parameters: the mean (μ) and the log variance (log(σ²)) of a Gaussian distribution. These parameters define the distribution of the latent variables for a given input data point.

Latent Space

The latent space is a lower-dimensional representation of the input data, where each point in the latent space corresponds to a unique encoding of an input data point. VAEs assume that the data can be generated by sampling from a continuous and differentiable distribution in the latent space. This assumption allows VAEs to generate new data points by sampling from the latent space and decoding the samples back into the original data space.

Decoder

The decoder is another neural network that takes a point in the latent space and maps it back to the original data space. The decoder learns to approximate the likelihood of the data given the latent variables, denoted as p(x|z). In VAEs, the decoder outputs the parameters of a distribution in the data space, which can be used to reconstruct the input data point from the latent space representation.

Variational Inference

Variational inference is a technique used to approximate intractable posterior distributions in Bayesian models. In the context of VAEs, variational inference is used to optimize the encoder and decoder networks by minimizing the difference between the true posterior distribution p(z|x) and the approximate posterior distribution q(z|x) learned by the encoder. This difference is measured using the Kullback-Leibler (KL) divergence, which quantifies the dissimilarity between two probability distributions.

The objective function for VAEs, known as the Evidence Lower BOund (ELBO), consists of two terms: the reconstruction loss and the KL divergence. The reconstruction loss measures the difference between the input data and the reconstructed data, while the KL divergence measures the difference between the true and approximate posterior distributions. By maximizing the ELBO, VAEs learn to generate accurate reconstructions of the input data while maintaining a smooth and continuous latent space.

Training VAEs

Training VAEs involves optimizing the encoder and decoder networks using stochastic gradient descent (SGD) or other optimization algorithms. During training, the encoder network takes an input data point and outputs the parameters of the approximate posterior distribution q(z|x). A sample is then drawn from this distribution and passed through the decoder network to generate a reconstruction of the input data point. The ELBO is computed for each data point, and the gradients of the ELBO with respect to the network parameters are used to update the encoder and decoder networks.

In summary, Variational Autoencoders (VAEs) are powerful generative models that learn a probabilistic representation of data using deep learning techniques. VAEs consist of an encoder and a decoder network, which map data points between the original data space and a lower-dimensional latent space. The training of VAEs is based on the variational inference technique, which optimizes the encoder and decoder networks by minimizing the difference between the true and approximate posterior distributions. VAEs have numerous applications in unsupervised learning, dimensionality reduction, and data generation.