A unified 3D human motion synthesis model via conditional variational auto-encoder

We present a unified and flexible framework to address the generalized problem of 3D motion synthesis that covers the tasks of motion prediction, completion, interpolation, and spatial-temporal recovery. Since these tasks have different input constraints and various fidelity and diversity requiremen...

Full description

Bibliographic Details
Main Authors:	Cai, Yujun, Wang, Yiwei, Zhu, Yiheng, Cham, Tat-Jen, Cai, Jianfei, Yuan, Junsong, Liu, Jun, Zheng, Chuanxia, Yan, Sijie, Ding, Henghui, Shen, Xiaohui, Liu, Ding, Thalmann, Nadia Magnenat
Other Authors:	School of Computer Science and Engineering
Format:	Conference Paper
Language:	English
Published:	2023
Subjects:	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Gestures and Body Pose Image and Video Synthesis
Online Access:	https://hdl.handle.net/10356/172651

_version_	1826127044210065408
author	Cai, Yujun Wang, Yiwei Zhu, Yiheng Cham, Tat-Jen Cai, Jianfei Yuan, Junsong Liu, Jun Zheng, Chuanxia Yan, Sijie Ding, Henghui Shen, Xiaohui Liu, Ding Thalmann, Nadia Magnenat
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Cai, Yujun Wang, Yiwei Zhu, Yiheng Cham, Tat-Jen Cai, Jianfei Yuan, Junsong Liu, Jun Zheng, Chuanxia Yan, Sijie Ding, Henghui Shen, Xiaohui Liu, Ding Thalmann, Nadia Magnenat
author_sort	Cai, Yujun
collection	NTU
description	We present a unified and flexible framework to address the generalized problem of 3D motion synthesis that covers the tasks of motion prediction, completion, interpolation, and spatial-temporal recovery. Since these tasks have different input constraints and various fidelity and diversity requirements, most existing approaches only cater to a specific task or use different architectures to address various tasks. Here we propose a unified framework based on Conditional Variational Auto-Encoder (CVAE), where we treat any arbitrary input as a masked motion series. Notably, by considering this problem as a conditional generation process, we estimate a parametric distribution of the missing regions based on the input conditions, from which to sample and synthesize the full motion series. To further allow the flexibility of manipulating the motion style of the generated series, we design an Action-Adaptive Modulation (AAM) to propagate the given semantic guidance through the whole sequence. We also introduce a cross-attention mechanism to exploit distant relations among decoder and encoder features for better realism and global consistency. We conducted extensive experiments on Human 3.6M and CMU-Mocap. The results show that our method produces coherent and realistic results for various motion synthesis tasks, with the synthesized motions distinctly adapted by the given action labels.
first_indexed	2024-10-01T07:02:19Z
format	Conference Paper
id	ntu-10356/172651
institution	Nanyang Technological University
language	English
last_indexed	2024-10-01T07:02:19Z
publishDate	2023
record_format	dspace
spelling	ntu-10356/1726512023-12-19T04:35:26Z A unified 3D human motion synthesis model via conditional variational auto-encoder Cai, Yujun Wang, Yiwei Zhu, Yiheng Cham, Tat-Jen Cai, Jianfei Yuan, Junsong Liu, Jun Zheng, Chuanxia Yan, Sijie Ding, Henghui Shen, Xiaohui Liu, Ding Thalmann, Nadia Magnenat School of Computer Science and Engineering 2021 IEEE/CVF International Conference on Computer Vision (ICCV) Institute for Media Innovation (IMI) Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Gestures and Body Pose Image and Video Synthesis We present a unified and flexible framework to address the generalized problem of 3D motion synthesis that covers the tasks of motion prediction, completion, interpolation, and spatial-temporal recovery. Since these tasks have different input constraints and various fidelity and diversity requirements, most existing approaches only cater to a specific task or use different architectures to address various tasks. Here we propose a unified framework based on Conditional Variational Auto-Encoder (CVAE), where we treat any arbitrary input as a masked motion series. Notably, by considering this problem as a conditional generation process, we estimate a parametric distribution of the missing regions based on the input conditions, from which to sample and synthesize the full motion series. To further allow the flexibility of manipulating the motion style of the generated series, we design an Action-Adaptive Modulation (AAM) to propagate the given semantic guidance through the whole sequence. We also introduce a cross-attention mechanism to exploit distant relations among decoder and encoder features for better realism and global consistency. We conducted extensive experiments on Human 3.6M and CMU-Mocap. The results show that our method produces coherent and realistic results for various motion synthesis tasks, with the synthesized motions distinctly adapted by the given action labels. Nanyang Technological University National Research Foundation (NRF) This research is supported by Institute for Media Innovation, Nanyang Technological University (IMI-NTU) and the National Research Foundation, Singapore under its International Research Centres in Singapore Funding Initiative. This research is also supported in part by Monash FIT Start-up Grant and SenseTime Gift Fund, National Science Foundation Grant CNS1951952 and SUTD project PIE-SGP-Al-2020-02. 2023-12-19T04:35:26Z 2023-12-19T04:35:26Z 2022 Conference Paper Cai, Y., Wang, Y., Zhu, Y., Cham, T., Cai, J., Yuan, J., Liu, J., Zheng, C., Yan, S., Ding, H., Shen, X., Liu, D. & Thalmann, N. M. (2022). A unified 3D human motion synthesis model via conditional variational auto-encoder. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 11625-11635. https://dx.doi.org/10.1109/ICCV48922.2021.01144 9781665428125 https://hdl.handle.net/10356/172651 10.1109/ICCV48922.2021.01144 2-s2.0-85113641917 11625 11635 en © 2021 IEEE. All rights reserved.
spellingShingle	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Gestures and Body Pose Image and Video Synthesis Cai, Yujun Wang, Yiwei Zhu, Yiheng Cham, Tat-Jen Cai, Jianfei Yuan, Junsong Liu, Jun Zheng, Chuanxia Yan, Sijie Ding, Henghui Shen, Xiaohui Liu, Ding Thalmann, Nadia Magnenat A unified 3D human motion synthesis model via conditional variational auto-encoder
title	A unified 3D human motion synthesis model via conditional variational auto-encoder
title_full	A unified 3D human motion synthesis model via conditional variational auto-encoder
title_fullStr	A unified 3D human motion synthesis model via conditional variational auto-encoder
title_full_unstemmed	A unified 3D human motion synthesis model via conditional variational auto-encoder
title_short	A unified 3D human motion synthesis model via conditional variational auto-encoder
title_sort	unified 3d human motion synthesis model via conditional variational auto encoder
topic	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Gestures and Body Pose Image and Video Synthesis
url	https://hdl.handle.net/10356/172651
work_keys_str_mv	AT caiyujun aunified3dhumanmotionsynthesismodelviaconditionalvariationalautoencoder AT wangyiwei aunified3dhumanmotionsynthesismodelviaconditionalvariationalautoencoder AT zhuyiheng aunified3dhumanmotionsynthesismodelviaconditionalvariationalautoencoder AT chamtatjen aunified3dhumanmotionsynthesismodelviaconditionalvariationalautoencoder AT caijianfei aunified3dhumanmotionsynthesismodelviaconditionalvariationalautoencoder AT yuanjunsong aunified3dhumanmotionsynthesismodelviaconditionalvariationalautoencoder AT liujun aunified3dhumanmotionsynthesismodelviaconditionalvariationalautoencoder AT zhengchuanxia aunified3dhumanmotionsynthesismodelviaconditionalvariationalautoencoder AT yansijie aunified3dhumanmotionsynthesismodelviaconditionalvariationalautoencoder AT dinghenghui aunified3dhumanmotionsynthesismodelviaconditionalvariationalautoencoder AT shenxiaohui aunified3dhumanmotionsynthesismodelviaconditionalvariationalautoencoder AT liuding aunified3dhumanmotionsynthesismodelviaconditionalvariationalautoencoder AT thalmannnadiamagnenat aunified3dhumanmotionsynthesismodelviaconditionalvariationalautoencoder AT caiyujun unified3dhumanmotionsynthesismodelviaconditionalvariationalautoencoder AT wangyiwei unified3dhumanmotionsynthesismodelviaconditionalvariationalautoencoder AT zhuyiheng unified3dhumanmotionsynthesismodelviaconditionalvariationalautoencoder AT chamtatjen unified3dhumanmotionsynthesismodelviaconditionalvariationalautoencoder AT caijianfei unified3dhumanmotionsynthesismodelviaconditionalvariationalautoencoder AT yuanjunsong unified3dhumanmotionsynthesismodelviaconditionalvariationalautoencoder AT liujun unified3dhumanmotionsynthesismodelviaconditionalvariationalautoencoder AT zhengchuanxia unified3dhumanmotionsynthesismodelviaconditionalvariationalautoencoder AT yansijie unified3dhumanmotionsynthesismodelviaconditionalvariationalautoencoder AT dinghenghui unified3dhumanmotionsynthesismodelviaconditionalvariationalautoencoder AT shenxiaohui unified3dhumanmotionsynthesismodelviaconditionalvariationalautoencoder AT liuding unified3dhumanmotionsynthesismodelviaconditionalvariationalautoencoder AT thalmannnadiamagnenat unified3dhumanmotionsynthesismodelviaconditionalvariationalautoencoder

A unified 3D human motion synthesis model via conditional variational auto-encoder

Similar Items