Score Distillation via DDIM Inversion

While 2D diffusion models generate realistic, high-detail images, 3D shape generation methods like Score Distillation Sampling (SDS) built on these 2D diffusion models produce cartoon-like, over-smoothed shapes. To help explain this discrepancy, in this paper we prove that the image guidance used in...

Full description

Bibliographic Details
Main Author: Lukoianov, Artem S.
Other Authors: Solomon, Justin
Format: Thesis
Published: Massachusetts Institute of Technology 2024
Online Access:https://hdl.handle.net/1721.1/156343
_version_ 1811073066603642880
author Lukoianov, Artem S.
author2 Solomon, Justin
author_facet Solomon, Justin
Lukoianov, Artem S.
author_sort Lukoianov, Artem S.
collection MIT
description While 2D diffusion models generate realistic, high-detail images, 3D shape generation methods like Score Distillation Sampling (SDS) built on these 2D diffusion models produce cartoon-like, over-smoothed shapes. To help explain this discrepancy, in this paper we prove that the image guidance used in Score Distillation can be understood as the velocity field of a 2D denoising generative process, up to the choice of a noise term. In particular, after a change of variables, SDS resembles a high-variance version of Denoising Diffusion Implicit Models (DDIM) with a differently-sampled noise term: SDS introduces noise i.i.d. randomly at each step, while DDIM infers it from the previous noise predictions. This excessive variance can lead to over-smoothing and prevent the algorithm from generating realistic outputs. We show that a better noise approximation can be recovered by inverting DDIM in each SDS update step. This modification makes SDS’s generative process for 2D images identical to DDIM, up to our change of variables. In 3D, it removes over-smoothing, preserves higher-frequency detail, and brings the generation quality closer to that of 2D samplers. Experimentally, our method achieves better or similar 3D generation quality compared to other works that improve SDS, all without training additional neural networks or 3D supervision. Our findings bridge the gap between 2D and 3D asset generation.
first_indexed 2024-09-23T09:28:05Z
format Thesis
id mit-1721.1/156343
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T09:28:05Z
publishDate 2024
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1563432024-08-22T03:55:04Z Score Distillation via DDIM Inversion Lukoianov, Artem S. Solomon, Justin Sitzmann, Vincent Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science While 2D diffusion models generate realistic, high-detail images, 3D shape generation methods like Score Distillation Sampling (SDS) built on these 2D diffusion models produce cartoon-like, over-smoothed shapes. To help explain this discrepancy, in this paper we prove that the image guidance used in Score Distillation can be understood as the velocity field of a 2D denoising generative process, up to the choice of a noise term. In particular, after a change of variables, SDS resembles a high-variance version of Denoising Diffusion Implicit Models (DDIM) with a differently-sampled noise term: SDS introduces noise i.i.d. randomly at each step, while DDIM infers it from the previous noise predictions. This excessive variance can lead to over-smoothing and prevent the algorithm from generating realistic outputs. We show that a better noise approximation can be recovered by inverting DDIM in each SDS update step. This modification makes SDS’s generative process for 2D images identical to DDIM, up to our change of variables. In 3D, it removes over-smoothing, preserves higher-frequency detail, and brings the generation quality closer to that of 2D samplers. Experimentally, our method achieves better or similar 3D generation quality compared to other works that improve SDS, all without training additional neural networks or 3D supervision. Our findings bridge the gap between 2D and 3D asset generation. S.M. 2024-08-21T18:58:12Z 2024-08-21T18:58:12Z 2024-05 2024-07-10T12:59:45.783Z Thesis https://hdl.handle.net/1721.1/156343 Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Lukoianov, Artem S.
Score Distillation via DDIM Inversion
title Score Distillation via DDIM Inversion
title_full Score Distillation via DDIM Inversion
title_fullStr Score Distillation via DDIM Inversion
title_full_unstemmed Score Distillation via DDIM Inversion
title_short Score Distillation via DDIM Inversion
title_sort score distillation via ddim inversion
url https://hdl.handle.net/1721.1/156343
work_keys_str_mv AT lukoianovartems scoredistillationviaddiminversion