Info
- Title: Conditional Image Generation with PixelCNN Decoders
- Task: Image Generation
- Author: A. van den Oord, N. Kalchbrenner, O. Vinyals, L. Espeholt, A. Graves, and K. Kavukcuoglu
- Date: Jun. 2016
- Arxiv: 1606.05328
- Published: NIPS 2016
- Affiliation: Google DeepMind
Highlights & Drawbacks
- Conditional with class labels or conv embeddings
- Can also serve as a powerful decoder
Motivation & Design
Typically, to make sure the CNN can only use information about pixels above and to the left of the current pixel, the filters of the convolution in PixelCNN are masked. However, its computational cost rise rapidly when stacked.
The gated activation unit: where $σ$ is the sigmoid non-linearity, $k$ is the number of the layer, $⊙$ is the element-wise product and $∗$ is the convolution operator.
Add a high-level image description represented as a latent vector $h$:
Performance & Ablation Study
Class-conditioned
Latent Vector(Embedding learned by convolutional networks)
Code
Related
- PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modification - Salimans - ICLR 2017=
- PixelRNN & PixelCNN: Pixel Recurrent Neural Networks - van den Oord - ICML 2016
- VQ-VAE: Neural Discrete Representation Learning - van den Oord - NIPS 2017
- VQ-VAE-2: Generating Diverse High-Fidelity Images with VQ-VAE-2 - Razavi - 2019