StructLDM: structured latent diffusion for 3D human generation
Recent 3D human generative models have achieved remarkable progress by learning 3D-aware GANs from 2D images. However, existing 3D human generative methods model humans in a compact 1D latent space, ignoring the articulated structure and semantics of human body topology. In this paper, we explore...
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Conference Paper |
Language: | English |
Published: |
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/180233 http://arxiv.org/abs/2404.01241v3 |
_version_ | 1811680506049200128 |
---|---|
author | Hu, Tao Hong, Fangzhou Liu, Ziwei |
author2 | College of Computing and Data Science |
author_facet | College of Computing and Data Science Hu, Tao Hong, Fangzhou Liu, Ziwei |
author_sort | Hu, Tao |
collection | NTU |
description | Recent 3D human generative models have achieved remarkable progress by
learning 3D-aware GANs from 2D images. However, existing 3D human generative
methods model humans in a compact 1D latent space, ignoring the articulated
structure and semantics of human body topology. In this paper, we explore more
expressive and higher-dimensional latent space for 3D human modeling and
propose StructLDM, a diffusion-based unconditional 3D human generative model,
which is learned from 2D images. StructLDM solves the challenges imposed due to
the high-dimensional growth of latent space with three key designs: 1) A
semantic structured latent space defined on the dense surface manifold of a
statistical human body template. 2) A structured 3D-aware auto-decoder that
factorizes the global latent space into several semantic body parts
parameterized by a set of conditional structured local NeRFs anchored to the
body template, which embeds the properties learned from the 2D training data
and can be decoded to render view-consistent humans under different poses and
clothing styles. 3) A structured latent diffusion model for generative human
appearance sampling. Extensive experiments validate StructLDM's
state-of-the-art generation performance and illustrate the expressiveness of
the structured latent space over the well-adopted 1D latent space. Notably,
StructLDM enables different levels of controllable 3D human generation and
editing, including pose/view/shape control, and high-level tasks including
compositional generations, part-aware clothing editing, 3D virtual try-on, etc.
Our project page is at: https://taohuumd.github.io/projects/StructLDM/. |
first_indexed | 2024-10-01T03:26:08Z |
format | Conference Paper |
id | ntu-10356/180233 |
institution | Nanyang Technological University |
language | English |
last_indexed | 2024-10-01T03:26:08Z |
publishDate | 2024 |
record_format | dspace |
spelling | ntu-10356/1802332024-09-26T03:00:27Z StructLDM: structured latent diffusion for 3D human generation Hu, Tao Hong, Fangzhou Liu, Ziwei College of Computing and Data Science 2024 European Conference on Computer Vision (ECCV) S-Lab Computer and Information Science 3D human generation Latent diffusion model Recent 3D human generative models have achieved remarkable progress by learning 3D-aware GANs from 2D images. However, existing 3D human generative methods model humans in a compact 1D latent space, ignoring the articulated structure and semantics of human body topology. In this paper, we explore more expressive and higher-dimensional latent space for 3D human modeling and propose StructLDM, a diffusion-based unconditional 3D human generative model, which is learned from 2D images. StructLDM solves the challenges imposed due to the high-dimensional growth of latent space with three key designs: 1) A semantic structured latent space defined on the dense surface manifold of a statistical human body template. 2) A structured 3D-aware auto-decoder that factorizes the global latent space into several semantic body parts parameterized by a set of conditional structured local NeRFs anchored to the body template, which embeds the properties learned from the 2D training data and can be decoded to render view-consistent humans under different poses and clothing styles. 3) A structured latent diffusion model for generative human appearance sampling. Extensive experiments validate StructLDM's state-of-the-art generation performance and illustrate the expressiveness of the structured latent space over the well-adopted 1D latent space. Notably, StructLDM enables different levels of controllable 3D human generation and editing, including pose/view/shape control, and high-level tasks including compositional generations, part-aware clothing editing, 3D virtual try-on, etc. Our project page is at: https://taohuumd.github.io/projects/StructLDM/. Ministry of Education (MOE) Submitted/Accepted version This study is supported by the Ministry of Education, Singapore, under its MOE AcRF Tier 2 (MOET2EP20221- 0012), NTU NAP, and under the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s). 2024-09-26T01:19:29Z 2024-09-26T01:19:29Z 2024 Conference Paper Hu, T., Hong, F. & Liu, Z. (2024). StructLDM: structured latent diffusion for 3D human generation. 2024 European Conference on Computer Vision (ECCV). https://dx.doi.org/10.48550/arXiv.2404.01241 https://hdl.handle.net/10356/180233 10.48550/arXiv.2404.01241 http://arxiv.org/abs/2404.01241v3 en MOET2EP20221- 0012 NTU-NAP IAF-ICP RIE2020 10.21979/N9/BXUEXV © 2024 ECCV. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. application/pdf |
spellingShingle | Computer and Information Science 3D human generation Latent diffusion model Hu, Tao Hong, Fangzhou Liu, Ziwei StructLDM: structured latent diffusion for 3D human generation |
title | StructLDM: structured latent diffusion for 3D human generation |
title_full | StructLDM: structured latent diffusion for 3D human generation |
title_fullStr | StructLDM: structured latent diffusion for 3D human generation |
title_full_unstemmed | StructLDM: structured latent diffusion for 3D human generation |
title_short | StructLDM: structured latent diffusion for 3D human generation |
title_sort | structldm structured latent diffusion for 3d human generation |
topic | Computer and Information Science 3D human generation Latent diffusion model |
url | https://hdl.handle.net/10356/180233 http://arxiv.org/abs/2404.01241v3 |
work_keys_str_mv | AT hutao structldmstructuredlatentdiffusionfor3dhumangeneration AT hongfangzhou structldmstructuredlatentdiffusionfor3dhumangeneration AT liuziwei structldmstructuredlatentdiffusionfor3dhumangeneration |