Unsupervised Compositional Image Decompositionwith Diffusion Models

Our visual understanding of the world is factorized and compositional. With just a single observation, we can ascertain both global and local attributes in a scene, such as lighting, weather, and underlying objects. These attributes are highly compositional and can be combined in various ways to cre...

Full description

Bibliographic Details
Main Author: Su, Jocelin
Other Authors: Tenenbaum, Joshua B.
Format: Thesis
Published: Massachusetts Institute of Technology 2023
Online Access:https://hdl.handle.net/1721.1/151350
_version_ 1826208702503321600
author Su, Jocelin
author2 Tenenbaum, Joshua B.
author_facet Tenenbaum, Joshua B.
Su, Jocelin
author_sort Su, Jocelin
collection MIT
description Our visual understanding of the world is factorized and compositional. With just a single observation, we can ascertain both global and local attributes in a scene, such as lighting, weather, and underlying objects. These attributes are highly compositional and can be combined in various ways to create new representations of the world. This paper introduces Decomp Diffusion, an unsupervised method for decomposing images into a set of underlying compositional factors, each represented by a different diffusion model. We demonstrate how each decomposed diffusion model captures a different factor of the scene, ranging from global scene descriptors, (e.g. shadows, foreground, or facial expression) to local scene descriptors (e.g. constituent objects). Furthermore, we show how these inferred factors can be flexibly composed and recombined both within and across different image datasets.
first_indexed 2024-09-23T14:10:01Z
format Thesis
id mit-1721.1/151350
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T14:10:01Z
publishDate 2023
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1513502023-08-01T03:26:19Z Unsupervised Compositional Image Decompositionwith Diffusion Models Su, Jocelin Tenenbaum, Joshua B. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Our visual understanding of the world is factorized and compositional. With just a single observation, we can ascertain both global and local attributes in a scene, such as lighting, weather, and underlying objects. These attributes are highly compositional and can be combined in various ways to create new representations of the world. This paper introduces Decomp Diffusion, an unsupervised method for decomposing images into a set of underlying compositional factors, each represented by a different diffusion model. We demonstrate how each decomposed diffusion model captures a different factor of the scene, ranging from global scene descriptors, (e.g. shadows, foreground, or facial expression) to local scene descriptors (e.g. constituent objects). Furthermore, we show how these inferred factors can be flexibly composed and recombined both within and across different image datasets. M.Eng. 2023-07-31T19:33:22Z 2023-07-31T19:33:22Z 2023-06 2023-06-06T16:35:03.735Z Thesis https://hdl.handle.net/1721.1/151350 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Su, Jocelin
Unsupervised Compositional Image Decompositionwith Diffusion Models
title Unsupervised Compositional Image Decompositionwith Diffusion Models
title_full Unsupervised Compositional Image Decompositionwith Diffusion Models
title_fullStr Unsupervised Compositional Image Decompositionwith Diffusion Models
title_full_unstemmed Unsupervised Compositional Image Decompositionwith Diffusion Models
title_short Unsupervised Compositional Image Decompositionwith Diffusion Models
title_sort unsupervised compositional image decompositionwith diffusion models
url https://hdl.handle.net/1721.1/151350
work_keys_str_mv AT sujocelin unsupervisedcompositionalimagedecompositionwithdiffusionmodels