Shape, Reflectance, and Illumination From Appearance

The image formation process describes how light interacts with the objects in a scene and eventually reaches the camera, forming an image that we observe. Inverting this process is a long-standing, ill-posed problem in computer vision, which involves estimating shape, material properties, and/or ill...

Full description

Bibliographic Details
Main Author: Zhang, Xiuming
Other Authors: Freeman, William T.
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/140015
Description
Summary:The image formation process describes how light interacts with the objects in a scene and eventually reaches the camera, forming an image that we observe. Inverting this process is a long-standing, ill-posed problem in computer vision, which involves estimating shape, material properties, and/or illumination passively from the object’s appearance. Such “inverse rendering” capabilities enable 3D understanding of our world (as desired in autonomous driving, robotics, etc.) and computer graphics applications such as relighting, view synthesis, and object capture (as desired in extended reality [XR], etc.). In this dissertation, we study inverse rendering by recovering three-dimensional (3D) shape, reflectance, illumination, or everything jointly under different setups. The input in these different setups varies from single images to multi-view images lit by multiple known lighting conditions, then to multi-view images under one unknown illumination. Across the setups, we explore optimization-based recovery that exploits multiple observations of the same object, learning-based reconstruction that heavily relies on data-driven priors, and a mixture of both. Depending on the target application, we perform inverse rendering at three different levels of decomposition: I) At a low level of abstraction, we develop physically-based models that explicitly solve for every term in the rendering equation, II) at a middle level, we utilize the light transport function to abstract away intermediate light bounces and model only the final “net effect”, and III) at a high level, we treat rendering as a black box and directly invert it with learned data-driven priors. We also demonstrate how higherlevel abstraction leads to models that are simple and applicable to single images but also possess fewer capabilities. This dissertation discusses four instances of inverse rendering, gradually ascending in the level of abstraction. In the first instance, we focus on the low-level abstraction where we decompose appearance explicitly into shape, reflectance, and illumination. To this end, we present a physically-based model capable of such full factorization under one unknown illumination and another that handles one-bounce indirect illumination. In the second instance, we ascend to the middle level of abstraction, at which we model appearance with the light transport function, demonstrating how this level of modeling easily supports relighting with global illumination, view synthesis, and both tasks simultaneously. Finally, at the high level of abstraction, we employ deep learning to directly invert the rendering black box in a data-driven fashion. Specifically, in the third instance, we recover 3D shapes from single images by learning data-driven shape priors and further make our reconstruction generalizable to novel shape classes unseen during training. Also relying on data-driven priors, the fourth instance concerns how to recover lighting from the appearance of the illuminated object, without explicitly modeling the image formation process.