βVAE-GAN?

09 May 2019

Looking at Frame by Frame Reconstruction, and reading through papers on VAE/GAN, I noticed that the quality of my reconstructions are terrible. Also the fact that KL-divergence term is not converging was still bothering. So I added β multiplier to KL-divergence term and observed how it affects the behavior of latent samples. The previously trained model was $\beta=1$, so I changed β to 0.1, 2.0, 4.0, 8.0.

Let's see how KL divergence and pixel error change.

It's giving a mixed signal to me. Before running this experiments, I was expecting that increased pressure on KL divergence would prevent the KL divergence from growing. (It is still possible that the trainig period is too short so that it has not reached convergece point.) However, it looks like, no matter how big KL divergence, GAN (descriminator) part of the model will eventually find the way to amplify the key difference and optimizing the feature matching term encourages the encoder to generate samples more different from normal distribution in latent space.

To see which term of KL divergence is causing the deviation, I recorded the distance of encoded samples from origin in the latent space.

What was surprising for me was that the whole sample distribution is drifting away from the origin. Before this plot, I was expecting that the samples are distributed around the origin, densely, so that some points are close to the origin but some points are very far. Instead it is either distributed like a ring, or one dense group sitting far from origin, like a galaxy. This also explains why fake images generated from random samples sampled from normal distribution in latent space do not resemble real image at all.

This VAE-GAN model, which I believed so, is neither VAE or GAN. Now the question is, how can I fix this?

Couple of thoughts;

Instead of relying on feature matching error, optimize pixel error with BCE.
Use non-linear activation in feature extraction part of descriminator.
Use WGAN-GP, of which error corresponds to image quality error.
Re-start from VAE / β-VAE.

References

Reddit: Unifying VAEs and GANs
It is not about VAE-GAN, but the discussion provides a good theoritical views.
Ferenc: An Alternative Update Rule for Generative Adversarial Networks
Linked from the above thread, how to make GAN optimize KL divergence instead of JS divergence.
Reddit: VAEs are not Autoencoders and it's linked post by Paul Rubenstein
This post and discussion helped me see the relationship of latent sample and generated data distribution.
How does posterior collapse in VAEs happen?
This discussion taught me the aspect of pwerful/flexible decoder and posterior collapse.
VAE Loss Plot by shayneobrien
In vanilla VAE, KL divergence converges nicely.