Generating Diverse High-Fidelity Images with VQ-VAE-2 - Razavi - 2019

Info

Title: Generating Diverse High-Fidelity Images with VQ-VAE-2
Task: Image Generation
Author: A. Razavi, A. van den Oord, and O. Vinyals
Date: Jun. 2019
Arxiv: 1906.00446
Affiliation: Google DeepMind

Highlights & Drawbacks

Diverse generated results
A multi-scale hierarchical organization of VQ-VAE
Self-attention mechanism over autoregressive model

Motivation & Design

Generating Diverse High-Fidelity Images with VQ-VAE-2

Stage 1: Training hierarchical VQ-VAE

The design of hierarchical latent variables intends to separate local patterns (i.e., texture) from global information (i.e., object shapes). The training of the larger bottom level codebook is conditioned on the smaller top level code too, so that it does not have to learn everything from scratch.

Generating Diverse High-Fidelity Images with VQ-VAE-2

Stage 2: Learning a prior over the latent discrete codebook

The decoder can receive input vectors sampled from a similar distribution as the one in training. A powerful autoregressive model enhanced with multi-headed self-attention layers is used to capture the correlations in spatial locations that are far apart in the image with a larger receptive field.

Generating Diverse High-Fidelity Images with VQ-VAE-2