Skip to content

OzlemMelda/VAE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

VAE

I implement a VAE, where the encoder is an LSTM network and the decoder is a convolutional network. I used MNIST dataset. MNIST is an image dataset, however you can treat each image as a sequence of rows. For instance, MNIST dataset has 28 × 28 images, which can be treated as a 28 dimensional multivariate sequence of length 28.

How to Run

  • install version of tensorflow==1.14.0 (strict requirement)
  • execute "python main.py" to train model and generate 100 images from decoder after training. takes 3 min to train the model and generate images.
  • new_images.png is generated at the end of execution.
  • new_model.ckpt.meta is generated at the end of execution.

Network Architecture

  • Implementing a single layer LSTM as the encoder of the VAE
    • See encoder() in Network class of model.py. Dense layers are used to calculate mu and sigma of latent space after LSTM layer. encoder() maps an input image to a proposed distribution over LSTM and Dense layers for that image. This distribution is called posterior in the network.
  • Implementing a convolutional decoder with transpose convolutional layers
    • This decoder takes a random vector sampled from the distribution that was outputted by the encoder and decodes it to an 28 × 28 grayscale image. Since I go to an image from a random vector, I needed transpose convolution.
    • See decoder() in Network class of model.py. Dense layers are used before transpose convolutional layers. The decoder basically takes a posterior sample and maps it back to a distribution of images that are plausible for the sample. It allows us to generate new images for any posterior sample we choose. I use a Bernoulli distribution. It can be modeled in a different way, for example as a Normal distribution.
  • Implementing the reconstruction loss with binary cross-entropy and the regularization term with KL-divergence
    • See optimization() in Network class of model.py. I train the model using the evidence lower bound (ELBO) which is an approximation to the data likelihood. The important point is that ELBO only uses the likelihood of a data point given our current estimate of its posterior, which we can sample. I just maximize the ELBO using gradient descent (important note: I printed out the negative of ELBO for loss track). Sampling from posterior are implemented using the reparameterization internally so that TensorFlow can backpropagate through them.

Results

The value of loss function decreases at first epoches, then there was no much improvement in decreasing loss value. On the other hand, the value of regularization term (that makes the latent space regular) increased at first epoches then stayed still. We come up with regularization term since two close points in the latent space should not give two completely different contents once decoded and for a chosen distribution, a point sampled from the latent space should give meaningful content once decoded. In order to do that regularisation is done by enforcing distributions to be close to a standard normal distribution. See Figure 1. Through training (especially at first epoches), model is forced to regularize latent space more.

image

Visualizing the generated samples

images

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages