ComboVerse: compositional 3D assets creation using spatially-aware diffusion guidance
Generating high-quality 3D assets from a given image is highly desirable in various applications such as AR/VR. Recent advances in single-image 3D generation explore feed-forward models that learn to infer the 3D model of an object without optimization. Though promising results have been achieved...
Main Authors: | , , , , , |
---|---|
Other Authors: | |
Format: | Conference Paper |
Language: | English |
Published: |
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/180240 http://arxiv.org/abs/2403.12409v1 |
_version_ | 1811678813025730560 |
---|---|
author | Chen, Yongwei Wang, Tengfei Wu, Tong Pan, Xingang Jia, Kui Liu, Ziwei |
author2 | College of Computing and Data Science |
author_facet | College of Computing and Data Science Chen, Yongwei Wang, Tengfei Wu, Tong Pan, Xingang Jia, Kui Liu, Ziwei |
author_sort | Chen, Yongwei |
collection | NTU |
description | Generating high-quality 3D assets from a given image is highly desirable in
various applications such as AR/VR. Recent advances in single-image 3D
generation explore feed-forward models that learn to infer the 3D model of an
object without optimization. Though promising results have been achieved in
single object generation, these methods often struggle to model complex 3D
assets that inherently contain multiple objects. In this work, we present
ComboVerse, a 3D generation framework that produces high-quality 3D assets with
complex compositions by learning to combine multiple models. 1) We first
perform an in-depth analysis of this ``multi-object gap'' from both model and
data perspectives. 2) Next, with reconstructed 3D models of different objects,
we seek to adjust their sizes, rotation angles, and locations to create a 3D
asset that matches the given image. 3) To automate this process, we apply
spatially-aware score distillation sampling (SSDS) from pretrained diffusion
models to guide the positioning of objects. Our proposed framework emphasizes
spatial alignment of objects, compared with standard score distillation
sampling, and thus achieves more accurate results. Extensive experiments
validate ComboVerse achieves clear improvements over existing methods in
generating compositional 3D assets. |
first_indexed | 2024-10-01T02:59:13Z |
format | Conference Paper |
id | ntu-10356/180240 |
institution | Nanyang Technological University |
language | English |
last_indexed | 2024-10-01T02:59:13Z |
publishDate | 2024 |
record_format | dspace |
spelling | ntu-10356/1802402024-09-26T02:18:03Z ComboVerse: compositional 3D assets creation using spatially-aware diffusion guidance Chen, Yongwei Wang, Tengfei Wu, Tong Pan, Xingang Jia, Kui Liu, Ziwei College of Computing and Data Science 2024 European Conference on Computer Vision (ECCV) S-Lab Computer and Information Science Generating high-quality 3D assets from a given image is highly desirable in various applications such as AR/VR. Recent advances in single-image 3D generation explore feed-forward models that learn to infer the 3D model of an object without optimization. Though promising results have been achieved in single object generation, these methods often struggle to model complex 3D assets that inherently contain multiple objects. In this work, we present ComboVerse, a 3D generation framework that produces high-quality 3D assets with complex compositions by learning to combine multiple models. 1) We first perform an in-depth analysis of this ``multi-object gap'' from both model and data perspectives. 2) Next, with reconstructed 3D models of different objects, we seek to adjust their sizes, rotation angles, and locations to create a 3D asset that matches the given image. 3) To automate this process, we apply spatially-aware score distillation sampling (SSDS) from pretrained diffusion models to guide the positioning of objects. Our proposed framework emphasizes spatial alignment of objects, compared with standard score distillation sampling, and thus achieves more accurate results. Extensive experiments validate ComboVerse achieves clear improvements over existing methods in generating compositional 3D assets. Submitted/Accepted version 2024-09-26T00:44:52Z 2024-09-26T00:44:52Z 2024 Conference Paper Chen, Y., Wang, T., Wu, T., Pan, X., Jia, K. & Liu, Z. (2024). ComboVerse: compositional 3D assets creation using spatially-aware diffusion guidance. 2024 European Conference on Computer Vision (ECCV). https://dx.doi.org/10.48550/arXiv.2403.12409 https://hdl.handle.net/10356/180240 10.48550/arXiv.2403.12409 http://arxiv.org/abs/2403.12409v1 en © 2024 ECCV. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. application/pdf |
spellingShingle | Computer and Information Science Chen, Yongwei Wang, Tengfei Wu, Tong Pan, Xingang Jia, Kui Liu, Ziwei ComboVerse: compositional 3D assets creation using spatially-aware diffusion guidance |
title | ComboVerse: compositional 3D assets creation using spatially-aware diffusion guidance |
title_full | ComboVerse: compositional 3D assets creation using spatially-aware diffusion guidance |
title_fullStr | ComboVerse: compositional 3D assets creation using spatially-aware diffusion guidance |
title_full_unstemmed | ComboVerse: compositional 3D assets creation using spatially-aware diffusion guidance |
title_short | ComboVerse: compositional 3D assets creation using spatially-aware diffusion guidance |
title_sort | comboverse compositional 3d assets creation using spatially aware diffusion guidance |
topic | Computer and Information Science |
url | https://hdl.handle.net/10356/180240 http://arxiv.org/abs/2403.12409v1 |
work_keys_str_mv | AT chenyongwei comboversecompositional3dassetscreationusingspatiallyawarediffusionguidance AT wangtengfei comboversecompositional3dassetscreationusingspatiallyawarediffusionguidance AT wutong comboversecompositional3dassetscreationusingspatiallyawarediffusionguidance AT panxingang comboversecompositional3dassetscreationusingspatiallyawarediffusionguidance AT jiakui comboversecompositional3dassetscreationusingspatiallyawarediffusionguidance AT liuziwei comboversecompositional3dassetscreationusingspatiallyawarediffusionguidance |