Generating Diverse High-Fidelity Images with VQ-VAE-2 - Razavi - 2019

 

Info

  • Title: Generating Diverse High-Fidelity Images with VQ-VAE-2
  • Task: Image Generation
  • Author: A. Razavi, A. van den Oord, and O. Vinyals
  • Date: Jun. 2019
  • Arxiv: 1906.00446
  • Affiliation: Google DeepMind

Highlights & Drawbacks

  • Diverse generated results
  • A multi-scale hierarchical organization of VQ-VAE
  • Self-attention mechanism over autoregressive model

Motivation & Design

Generating Diverse High-Fidelity Images with VQ-VAE-2

Stage 1: Training hierarchical VQ-VAE

The design of hierarchical latent variables intends to separate local patterns (i.e., texture) from global information (i.e., object shapes). The training of the larger bottom level codebook is conditioned on the smaller top level code too, so that it does not have to learn everything from scratch.

Generating Diverse High-Fidelity Images with VQ-VAE-2

Stage 2: Learning a prior over the latent discrete codebook

The decoder can receive input vectors sampled from a similar distribution as the one in training. A powerful autoregressive model enhanced with multi-headed self-attention layers is used to capture the correlations in spatial locations that are far apart in the image with a larger receptive field.

Generating Diverse High-Fidelity Images with VQ-VAE-2

Performance & Ablation Study

Diverse Results, compared to BigGAN: Generating Diverse High-Fidelity Images with VQ-VAE-2

Inception Score, FID, Precision-Recall Metric Generating Diverse High-Fidelity Images with VQ-VAE-2