Relightify: Relightable 3D Faces from a Single Image via Diffusion Models

Abstract

Following the remarkable success of diffusion models on image generation, recent works have also demonstrated their impressive ability to address a number of inverse problems in an unsupervised way, by properly constraining the sampling process based on a conditioning input. Motivated by this, in this paper, we present the first approach to use diffusion models as a prior for highly accurate 3D facial BRDF reconstruction from a single image. We start by leveraging a high-quality UV dataset of facial reflectance (diffuse and specular albedo and normals), which we render under varying illumination settings to simulate natural RGB textures and, then, train an unconditional diffusion model on concatenated pairs of rendered textures and reflectance components. At test time, we fit a 3D morphable model to the given image and unwrap the face in a partial UV texture. By sampling from the diffusion model, while retaining the observed texture part intact, the model inpaints not only the self-occluded areas but also the unknown reflectance components, in a single sequence of denoising steps. In contrast to existing methods, we directly acquire the observed texture from the input image, thus, resulting in more faithful and consistent reflectance estimation. Through a series of qualitative and quantitative comparisons, we demonstrate superior performance in both texture completion as well as reflectance reconstruction tasks.

Video

Overview

We propose a diffusion-based inpainting approach to estimate both the UV texture with existing baked illumination and the actual reflectance of a face from an image in a single process. At the core of our approach lies an unconditional diffusion generative model trained on pairs of textures and their accompanying reflectance.

At test time, we first perform standard 3DMM fitting to get a partial UV texture via image-to-uv rasterization. Then, starting from random noise, we utilize the known texture to guide the sampling process of our texture/reflectance diffusion model towards completing the unobserved pixels. At the end of the process, we acquire high-quality rendering assets, making our 3D avatar realistically renderable.

Results

Reconstructions “in the wild”

Our avatars are compatible with commercial renderers, enabling photo-realistic rendering in different environments.

Challenging cases

We condition the reflectance prediction on the genuine visible facial texture, leading to more accurate capturing than fitting-based competitors.

BibTeX

@InProceedings{Paraperas_2023_ICCV,
    author={Paraperas Papantoniou, Foivos and Lattas, Alexandros and Moschoglou, Stylianos and Zafeiriou, Stefanos},
    title={Relightify: Relightable 3D Faces from a Single Image via Diffusion Models},
    booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    year={2023}
}