Semantic Photo Manipulation with a Generative Image Prior - Bau - SIGGRAPH 2019 - PyTorch

 

Info

  • Title: Semantic Photo Manipulation with a Generative Image Prior
  • Task: Image Manipulation
  • Author: DAVID BAU, HENDRIK STROBELT, JONAS WULFF, BOLEI ZHOU, JUN-YAN ZHU, ANTONIO TORRALBA
  • Date: July. 2019
  • Published: ACM SIGGRAPH 2019
  • Affiliation: MIT CSAIL

Highlights & Drawbacks

  • Image-specific generator for preserving semantic representation after editing
  • Interactive tool for semantic editing
  • An optimization step needed after editing, which takes about 30 seconds on a modern GPU

Motivation & Design

The roll of deep generative models will be to provide latent semantic representations in which concepts can be directly manipulated and then to preserve image realism when semantic changes are made.

Semantic Photo Manipulation with a Generative Image Prior

Overall process:

  1. We first compute a latent vector $z = E(x)$ representing $x$.
  2. We then apply a semantic vector space operation $z_e = edit(z)$ in the latent space; this could add, remove, or alter a semantic concept in the image.
  3. Finally, we regenerate the image from the modified $z_e$ .

Unfortunately, as can be seen in (b), usually the input image $x$ cannot be precisely generated by the generator $G$ , so (c) using the generator $G$ to create the edited image $G (x_e )$ will result in the loss of many attributes and details of the original image (a). Therefore to generate the image we propose a new last step: (d) We learn an image-specific generator $G′$ which can produce $x′_e = G′(z_e )$ that is faithful to the original image $x$ in the unedited regions.

Controllable Image Synthesis with GANs

Seek for a latent code $z$ that minimizes the reconstruction loss between the input image $x$ and generated image $G(z)$:

To ensure that the image- specific generator $G′$ has a similar latent space structure as the original generator $G$, we construct $G′$ by preserving all the early layers of $G$ precisely and applying perturbations only at the layers of the network that determine the fine-grained details.

A small network $R$ was trained to produce small perturbations $δ_i$ that multiply each layer’s output in $G_F$ by $1 + δ_i$ . Each $δ_i$ has the same number of channels and dimensions as the feature map of $G_F$ at layer $i$ . This multiplicative change adjusts each feature map activation to be faithful to the output image. (Similar results can be obtained by using additive $δ_i$ .) Formally, we construct $G’_{F}$ as follows:

Semantic Photo Manipulation with a Generative Image Prior

Performance & Ablation Study

Examples of editing work-flow. From left to right: input image $x$ is first converted to GAN image $G(z)$, edited by painting a mask, the effect of this mask edit can be previewed at interactive rates as $G(z_e )$. It can be finally rendered using image-specific adaption as $G′(z_e)$. Semantic Photo Manipulation with a Generative Image Prior

Changing the appearance of domes, grass, and trees. In each section, we show the original image $x$, the user’s edit overlayed on $x$ and three variations under different selections of the reference image. Additionally, we show reconstructions of the reference image from $G$. In (c), we fix the reference image and only vary the strength term $s$. Semantic Photo Manipulation with a Generative Image Prior

Code