Modeling the Conditional Distribution of Co-Speech Upper Body Gesture Jointly Using Conditional-GAN and Unrolled-GAN
Co-speech gestures are a crucial, non-verbal modality for humans to communicate. Social agents also need this capability to be more human-like and comprehensive. This study aims to model the distribution of gestures conditioned on human speech features. Unlike previous studies that try to find injec...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-01-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/10/3/228 |
_version_ | 1797409265940430848 |
---|---|
author | Bowen Wu Chaoran Liu Carlos Toshinori Ishi Hiroshi Ishiguro |
author_facet | Bowen Wu Chaoran Liu Carlos Toshinori Ishi Hiroshi Ishiguro |
author_sort | Bowen Wu |
collection | DOAJ |
description | Co-speech gestures are a crucial, non-verbal modality for humans to communicate. Social agents also need this capability to be more human-like and comprehensive. This study aims to model the distribution of gestures conditioned on human speech features. Unlike previous studies that try to find injective functions that map speech to gestures, we propose a novel, conditional GAN-based generative model to not only convert speech into gestures but also to approximate the distribution of gestures conditioned on speech through parameterization. An objective evaluation and user study show that the proposed model outperformed the existing deterministic model, indicating that generative models can approximate real patterns of co-speech gestures better than the existing deterministic model. Our results suggest that it is critical to consider the nature of randomness when modeling co-speech gestures. |
first_indexed | 2024-03-09T04:11:49Z |
format | Article |
id | doaj.art-96649c7068e54be2a9114bdf569c3454 |
institution | Directory Open Access Journal |
issn | 2079-9292 |
language | English |
last_indexed | 2024-03-09T04:11:49Z |
publishDate | 2021-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Electronics |
spelling | doaj.art-96649c7068e54be2a9114bdf569c34542023-12-03T13:59:06ZengMDPI AGElectronics2079-92922021-01-0110322810.3390/electronics10030228Modeling the Conditional Distribution of Co-Speech Upper Body Gesture Jointly Using Conditional-GAN and Unrolled-GANBowen Wu0Chaoran Liu1Carlos Toshinori Ishi2Hiroshi Ishiguro3Interactive Robot Research Team, Institute of Physical and Chemical Research (RIKEN), Kyoto 619-0237, JapanHiroshi Ishiguro Laboratories, Advanced Telecommunications Research Institute International (ATR), Kyoto 619-0237, JapanInteractive Robot Research Team, Institute of Physical and Chemical Research (RIKEN), Kyoto 619-0237, JapanGraduate School of Engineering Science, Osaka University, Osaka 565-0871, JapanCo-speech gestures are a crucial, non-verbal modality for humans to communicate. Social agents also need this capability to be more human-like and comprehensive. This study aims to model the distribution of gestures conditioned on human speech features. Unlike previous studies that try to find injective functions that map speech to gestures, we propose a novel, conditional GAN-based generative model to not only convert speech into gestures but also to approximate the distribution of gestures conditioned on speech through parameterization. An objective evaluation and user study show that the proposed model outperformed the existing deterministic model, indicating that generative models can approximate real patterns of co-speech gestures better than the existing deterministic model. Our results suggest that it is critical to consider the nature of randomness when modeling co-speech gestures.https://www.mdpi.com/2079-9292/10/3/228gesture generationsocial robotsgenerative modelneural networkdeep learning |
spellingShingle | Bowen Wu Chaoran Liu Carlos Toshinori Ishi Hiroshi Ishiguro Modeling the Conditional Distribution of Co-Speech Upper Body Gesture Jointly Using Conditional-GAN and Unrolled-GAN Electronics gesture generation social robots generative model neural network deep learning |
title | Modeling the Conditional Distribution of Co-Speech Upper Body Gesture Jointly Using Conditional-GAN and Unrolled-GAN |
title_full | Modeling the Conditional Distribution of Co-Speech Upper Body Gesture Jointly Using Conditional-GAN and Unrolled-GAN |
title_fullStr | Modeling the Conditional Distribution of Co-Speech Upper Body Gesture Jointly Using Conditional-GAN and Unrolled-GAN |
title_full_unstemmed | Modeling the Conditional Distribution of Co-Speech Upper Body Gesture Jointly Using Conditional-GAN and Unrolled-GAN |
title_short | Modeling the Conditional Distribution of Co-Speech Upper Body Gesture Jointly Using Conditional-GAN and Unrolled-GAN |
title_sort | modeling the conditional distribution of co speech upper body gesture jointly using conditional gan and unrolled gan |
topic | gesture generation social robots generative model neural network deep learning |
url | https://www.mdpi.com/2079-9292/10/3/228 |
work_keys_str_mv | AT bowenwu modelingtheconditionaldistributionofcospeechupperbodygesturejointlyusingconditionalganandunrolledgan AT chaoranliu modelingtheconditionaldistributionofcospeechupperbodygesturejointlyusingconditionalganandunrolledgan AT carlostoshinoriishi modelingtheconditionaldistributionofcospeechupperbodygesturejointlyusingconditionalganandunrolledgan AT hiroshiishiguro modelingtheconditionaldistributionofcospeechupperbodygesturejointlyusingconditionalganandunrolledgan |