Info
- Title: PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications
- Task: Image Generation
- Author: T. Salimans, A. Karpathy, X. Chen, and D. P. Kingma
- Date: Jan. 2017
- Arxiv: 1701.05517
- Published: ICLR 2017
- Affiliation: OpenAI
Highlights & Drawbacks
- A discretized logistic mixture likelihood on the pixels, rather than a 256-way softmax, which speeds up training.
- Condition on whole pixels, rather than R/G/B sub-pixels, simplifying the model structure.
- Downsampling to efficiently capture structure at multiple resolutions.
- Additional shortcut connections to further speed up optimization.
- Regularize the model using dropout
Motivation & Design
Discretized logistic mixture likelihood
By choosing a simple continuous distribution for modeling $ν$ we obtain a smooth and memory efficient predictive distribution for $x$. Here, we take this continuous univariate distribution to be a mixture of logistic distributions which allows us to easily calculate the probability on the observed discretized value $x$ For all sub-pixel values $x$ excepting the edge cases 0 and 255 we have:
The output of our network is thus of much lower dimension, yielding much denser gradients of the loss with respect to our parameters.
More residual connections
Performance & Ablation Study
Training on a machine with 8 Maxwell TITAN X GPUs achieves 3.0 bits per dimension in about 10 hours and it takes approximately 5 days to converge to 2.92.
Code
Related
- Gated PixelCNN: Conditional Image Generation with PixelCNN Decoders - van den Oord - NIPS 2016
- PixelRNN & PixelCNN: Pixel Recurrent Neural Networks - van den Oord - ICML 2016
- VQ-VAE: Neural Discrete Representation Learning - van den Oord - NIPS 2017
- VQ-VAE-2: Generating Diverse High-Fidelity Images with VQ-VAE-2 - Razavi - 2019