Conditional Image Generation with PixelCNN Decoders - van den Oord - NIPS 2016 - TensorFlow & PyTorch Code

Info

Title: Conditional Image Generation with PixelCNN Decoders
Task: Image Generation
Author: A. van den Oord, N. Kalchbrenner, O. Vinyals, L. Espeholt, A. Graves, and K. Kavukcuoglu
Date: Jun. 2016
Arxiv: 1606.05328
Published: NIPS 2016
Affiliation: Google DeepMind

Highlights & Drawbacks

Conditional with class labels or conv embeddings
Can also serve as a powerful decoder

Motivation & Design

Typically, to make sure the CNN can only use information about pixels above and to the left of the current pixel, the filters of the convolution in PixelCNN are masked. However, its computational cost rise rapidly when stacked.

The gated activation unit: $y = \tanh (W_{k, f} * x) ⊙ σ (W_{k, g} * x),$ where $σ$ is the sigmoid non-linearity, $k$ is the number of the layer, $⊙$ is the element-wise product and $*$ is the convolution operator.

Add a high-level image description represented as a latent vector $h$ : $y = \tanh (W_{k, f} * x + V_{k, f}^{T} h) ⊙ σ (W_{k, g} * x + V_{k, g}^{T} h)$