PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modification - Salimans - ICLR 2017 - TensorFlow Code

Info

Title: PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications
Task: Image Generation
Author: T. Salimans, A. Karpathy, X. Chen, and D. P. Kingma
Date: Jan. 2017
Arxiv: 1701.05517
Published: ICLR 2017
Affiliation: OpenAI

Highlights & Drawbacks

A discretized logistic mixture likelihood on the pixels, rather than a 256-way softmax, which speeds up training.
Condition on whole pixels, rather than R/G/B sub-pixels, simplifying the model structure.
Downsampling to efficiently capture structure at multiple resolutions.
Additional shortcut connections to further speed up optimization.
Regularize the model using dropout

Motivation & Design

Discretized logistic mixture likelihood

By choosing a simple continuous distribution for modeling $ν$ we obtain a smooth and memory efficient predictive distribution for $x$ . Here, we take this continuous univariate distribution to be a mixture of logistic distributions which allows us to easily calculate the probability on the observed discretized value $x$ For all sub-pixel values $x$ excepting the edge cases 0 and 255 we have: $ν \sim \sum_{i = 1}^{K} π_{i} logistic (μ_{i}, s_{i})$

P (x | π, μ, s) = \sum_{i = 1}^{K} π_{i} [σ ((x + 0.5 - μ_{i}) / s_{i}) - σ ((x - 0.5 - μ_{i}) / s_{i})]

The output of our network is thus of much lower dimension, yielding much denser gradients of the loss with respect to our parameters.

More residual connections

PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modification

Performance & Ablation Study

PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modification

Training on a machine with 8 Maxwell TITAN X GPUs achieves 3.0 bits per dimension in about 10 hours and it takes approximately 5 days to converge to 2.92.