Modeling the Conditional Distribution of Co-Speech Upper Body Gesture Jointly Using Conditional-GAN and Unrolled-GAN

Co-speech gestures are a crucial, non-verbal modality for humans to communicate. Social agents also need this capability to be more human-like and comprehensive. This study aims to model the distribution of gestures conditioned on human speech features. Unlike previous studies that try to find injec...

Full description

Bibliographic Details
Main Authors: Bowen Wu, Chaoran Liu, Carlos Toshinori Ishi, Hiroshi Ishiguro
Format: Article
Language:English
Published: MDPI AG 2021-01-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/10/3/228
_version_ 1797409265940430848
author Bowen Wu
Chaoran Liu
Carlos Toshinori Ishi
Hiroshi Ishiguro
author_facet Bowen Wu
Chaoran Liu
Carlos Toshinori Ishi
Hiroshi Ishiguro
author_sort Bowen Wu
collection DOAJ
description Co-speech gestures are a crucial, non-verbal modality for humans to communicate. Social agents also need this capability to be more human-like and comprehensive. This study aims to model the distribution of gestures conditioned on human speech features. Unlike previous studies that try to find injective functions that map speech to gestures, we propose a novel, conditional GAN-based generative model to not only convert speech into gestures but also to approximate the distribution of gestures conditioned on speech through parameterization. An objective evaluation and user study show that the proposed model outperformed the existing deterministic model, indicating that generative models can approximate real patterns of co-speech gestures better than the existing deterministic model. Our results suggest that it is critical to consider the nature of randomness when modeling co-speech gestures.
first_indexed 2024-03-09T04:11:49Z
format Article
id doaj.art-96649c7068e54be2a9114bdf569c3454
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-03-09T04:11:49Z
publishDate 2021-01-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-96649c7068e54be2a9114bdf569c34542023-12-03T13:59:06ZengMDPI AGElectronics2079-92922021-01-0110322810.3390/electronics10030228Modeling the Conditional Distribution of Co-Speech Upper Body Gesture Jointly Using Conditional-GAN and Unrolled-GANBowen Wu0Chaoran Liu1Carlos Toshinori Ishi2Hiroshi Ishiguro3Interactive Robot Research Team, Institute of Physical and Chemical Research (RIKEN), Kyoto 619-0237, JapanHiroshi Ishiguro Laboratories, Advanced Telecommunications Research Institute International (ATR), Kyoto 619-0237, JapanInteractive Robot Research Team, Institute of Physical and Chemical Research (RIKEN), Kyoto 619-0237, JapanGraduate School of Engineering Science, Osaka University, Osaka 565-0871, JapanCo-speech gestures are a crucial, non-verbal modality for humans to communicate. Social agents also need this capability to be more human-like and comprehensive. This study aims to model the distribution of gestures conditioned on human speech features. Unlike previous studies that try to find injective functions that map speech to gestures, we propose a novel, conditional GAN-based generative model to not only convert speech into gestures but also to approximate the distribution of gestures conditioned on speech through parameterization. An objective evaluation and user study show that the proposed model outperformed the existing deterministic model, indicating that generative models can approximate real patterns of co-speech gestures better than the existing deterministic model. Our results suggest that it is critical to consider the nature of randomness when modeling co-speech gestures.https://www.mdpi.com/2079-9292/10/3/228gesture generationsocial robotsgenerative modelneural networkdeep learning
spellingShingle Bowen Wu
Chaoran Liu
Carlos Toshinori Ishi
Hiroshi Ishiguro
Modeling the Conditional Distribution of Co-Speech Upper Body Gesture Jointly Using Conditional-GAN and Unrolled-GAN
Electronics
gesture generation
social robots
generative model
neural network
deep learning
title Modeling the Conditional Distribution of Co-Speech Upper Body Gesture Jointly Using Conditional-GAN and Unrolled-GAN
title_full Modeling the Conditional Distribution of Co-Speech Upper Body Gesture Jointly Using Conditional-GAN and Unrolled-GAN
title_fullStr Modeling the Conditional Distribution of Co-Speech Upper Body Gesture Jointly Using Conditional-GAN and Unrolled-GAN
title_full_unstemmed Modeling the Conditional Distribution of Co-Speech Upper Body Gesture Jointly Using Conditional-GAN and Unrolled-GAN
title_short Modeling the Conditional Distribution of Co-Speech Upper Body Gesture Jointly Using Conditional-GAN and Unrolled-GAN
title_sort modeling the conditional distribution of co speech upper body gesture jointly using conditional gan and unrolled gan
topic gesture generation
social robots
generative model
neural network
deep learning
url https://www.mdpi.com/2079-9292/10/3/228
work_keys_str_mv AT bowenwu modelingtheconditionaldistributionofcospeechupperbodygesturejointlyusingconditionalganandunrolledgan
AT chaoranliu modelingtheconditionaldistributionofcospeechupperbodygesturejointlyusingconditionalganandunrolledgan
AT carlostoshinoriishi modelingtheconditionaldistributionofcospeechupperbodygesturejointlyusingconditionalganandunrolledgan
AT hiroshiishiguro modelingtheconditionaldistributionofcospeechupperbodygesturejointlyusingconditionalganandunrolledgan