PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modification - Salimans - ICLR 2017 - TensorFlow Code

 

Info

  • Title: PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications
  • Task: Image Generation
  • Author: T. Salimans, A. Karpathy, X. Chen, and D. P. Kingma
  • Date: Jan. 2017
  • Arxiv: 1701.05517
  • Published: ICLR 2017
  • Affiliation: OpenAI

Highlights & Drawbacks

  • A discretized logistic mixture likelihood on the pixels, rather than a 256-way softmax, which speeds up training.
  • Condition on whole pixels, rather than R/G/B sub-pixels, simplifying the model structure.
  • Downsampling to efficiently capture structure at multiple resolutions.
  • Additional shortcut connections to further speed up optimization.
  • Regularize the model using dropout

Motivation & Design

Discretized logistic mixture likelihood

By choosing a simple continuous distribution for modeling $ν$ we obtain a smooth and memory efficient predictive distribution for $x$. Here, we take this continuous univariate distribution to be a mixture of logistic distributions which allows us to easily calculate the probability on the observed discretized value $x$ For all sub-pixel values $x$ excepting the edge cases 0 and 255 we have:

The output of our network is thus of much lower dimension, yielding much denser gradients of the loss with respect to our parameters.

More residual connections

PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modification

Performance & Ablation Study

PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modification

Training on a machine with 8 Maxwell TITAN X GPUs achieves 3.0 bits per dimension in about 10 hours and it takes approximately 5 days to converge to 2.92.

Code