A multimodal approach for modeling engagement in conversation

Recently, engagement has emerged as a key variable explaining the success of conversation. In the perspective of human-machine interaction, an automatic assessment of engagement becomes crucial to better understand the dynamics of an interaction and to design socially-aware robots. This paper presen...

Full description

Bibliographic Details
Main Authors: Arthur Pellet-Rostaing, Roxane Bertrand, Auriane Boudin, Stéphane Rauzy, Philippe Blache
Format: Article
Language:English
Published: Frontiers Media S.A. 2023-03-01
Series:Frontiers in Computer Science
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fcomp.2023.1062342/full
_version_ 1811161684029472768
author Arthur Pellet-Rostaing
Arthur Pellet-Rostaing
Roxane Bertrand
Roxane Bertrand
Auriane Boudin
Auriane Boudin
Stéphane Rauzy
Stéphane Rauzy
Philippe Blache
Philippe Blache
author_facet Arthur Pellet-Rostaing
Arthur Pellet-Rostaing
Roxane Bertrand
Roxane Bertrand
Auriane Boudin
Auriane Boudin
Stéphane Rauzy
Stéphane Rauzy
Philippe Blache
Philippe Blache
author_sort Arthur Pellet-Rostaing
collection DOAJ
description Recently, engagement has emerged as a key variable explaining the success of conversation. In the perspective of human-machine interaction, an automatic assessment of engagement becomes crucial to better understand the dynamics of an interaction and to design socially-aware robots. This paper presents a predictive model of the level of engagement in conversations. It shows in particular the interest of using a rich multimodal set of features, outperforming the existing models in this domain. In terms of methodology, study is based on two audio-visual corpora of naturalistic face-to-face interactions. These resources have been enriched with various annotations of verbal and nonverbal behaviors, such as smiles, head nods, and feedbacks. In addition, we manually annotated gestures intensity. Based on a review of previous works in psychology and human-machine interaction, we propose a new definition of the notion of engagement, adequate for the description of this phenomenon both in natural and mediated environments. This definition have been implemented in our annotation scheme. In our work, engagement is studied at the turn level, known to be crucial for the organization of the conversation. Even though there is still a lack of consensus around their precise definition, we have developed a turn detection tool. A multimodal characterization of engagement is performed using a multi-level classification of turns. We claim a set of multimodal cues, involving prosodic, mimo-gestural and morpho-syntactic information, is relevant to characterize the level of engagement of speakers in conversation. Our results significantly outperform the baseline and reach state-of-the-art level (0.76 weighted F-score). The most contributing modalities are identified by testing the performance of a two-layer perceptron when trained on unimodal feature sets and on combinations of two to four modalities. These results support our claim about multimodality: combining features related to the speech fundamental frequency and energy with mimo-gestural features leads to the best performance.
first_indexed 2024-04-10T06:18:19Z
format Article
id doaj.art-44534bb2a5314b7291fe21d1b6d127aa
institution Directory Open Access Journal
issn 2624-9898
language English
last_indexed 2024-04-10T06:18:19Z
publishDate 2023-03-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Computer Science
spelling doaj.art-44534bb2a5314b7291fe21d1b6d127aa2023-03-02T05:04:55ZengFrontiers Media S.A.Frontiers in Computer Science2624-98982023-03-01510.3389/fcomp.2023.10623421062342A multimodal approach for modeling engagement in conversationArthur Pellet-Rostaing0Arthur Pellet-Rostaing1Roxane Bertrand2Roxane Bertrand3Auriane Boudin4Auriane Boudin5Stéphane Rauzy6Stéphane Rauzy7Philippe Blache8Philippe Blache9Laboratoire Parole and Langage (LPL-CNRS), Aix-en-Provence, FranceInstitute of Language, Communication and the Brain (ILCB), Marseille, FranceLaboratoire Parole and Langage (LPL-CNRS), Aix-en-Provence, FranceInstitute of Language, Communication and the Brain (ILCB), Marseille, FranceLaboratoire Parole and Langage (LPL-CNRS), Aix-en-Provence, FranceInstitute of Language, Communication and the Brain (ILCB), Marseille, FranceLaboratoire Parole and Langage (LPL-CNRS), Aix-en-Provence, FranceInstitute of Language, Communication and the Brain (ILCB), Marseille, FranceLaboratoire Parole and Langage (LPL-CNRS), Aix-en-Provence, FranceInstitute of Language, Communication and the Brain (ILCB), Marseille, FranceRecently, engagement has emerged as a key variable explaining the success of conversation. In the perspective of human-machine interaction, an automatic assessment of engagement becomes crucial to better understand the dynamics of an interaction and to design socially-aware robots. This paper presents a predictive model of the level of engagement in conversations. It shows in particular the interest of using a rich multimodal set of features, outperforming the existing models in this domain. In terms of methodology, study is based on two audio-visual corpora of naturalistic face-to-face interactions. These resources have been enriched with various annotations of verbal and nonverbal behaviors, such as smiles, head nods, and feedbacks. In addition, we manually annotated gestures intensity. Based on a review of previous works in psychology and human-machine interaction, we propose a new definition of the notion of engagement, adequate for the description of this phenomenon both in natural and mediated environments. This definition have been implemented in our annotation scheme. In our work, engagement is studied at the turn level, known to be crucial for the organization of the conversation. Even though there is still a lack of consensus around their precise definition, we have developed a turn detection tool. A multimodal characterization of engagement is performed using a multi-level classification of turns. We claim a set of multimodal cues, involving prosodic, mimo-gestural and morpho-syntactic information, is relevant to characterize the level of engagement of speakers in conversation. Our results significantly outperform the baseline and reach state-of-the-art level (0.76 weighted F-score). The most contributing modalities are identified by testing the performance of a two-layer perceptron when trained on unimodal feature sets and on combinations of two to four modalities. These results support our claim about multimodality: combining features related to the speech fundamental frequency and energy with mimo-gestural features leads to the best performance.https://www.frontiersin.org/articles/10.3389/fcomp.2023.1062342/fullengagement modelmultimodalityconversational skillsconversational agentsengagement classificationannotated corpora
spellingShingle Arthur Pellet-Rostaing
Arthur Pellet-Rostaing
Roxane Bertrand
Roxane Bertrand
Auriane Boudin
Auriane Boudin
Stéphane Rauzy
Stéphane Rauzy
Philippe Blache
Philippe Blache
A multimodal approach for modeling engagement in conversation
Frontiers in Computer Science
engagement model
multimodality
conversational skills
conversational agents
engagement classification
annotated corpora
title A multimodal approach for modeling engagement in conversation
title_full A multimodal approach for modeling engagement in conversation
title_fullStr A multimodal approach for modeling engagement in conversation
title_full_unstemmed A multimodal approach for modeling engagement in conversation
title_short A multimodal approach for modeling engagement in conversation
title_sort multimodal approach for modeling engagement in conversation
topic engagement model
multimodality
conversational skills
conversational agents
engagement classification
annotated corpora
url https://www.frontiersin.org/articles/10.3389/fcomp.2023.1062342/full
work_keys_str_mv AT arthurpelletrostaing amultimodalapproachformodelingengagementinconversation
AT arthurpelletrostaing amultimodalapproachformodelingengagementinconversation
AT roxanebertrand amultimodalapproachformodelingengagementinconversation
AT roxanebertrand amultimodalapproachformodelingengagementinconversation
AT aurianeboudin amultimodalapproachformodelingengagementinconversation
AT aurianeboudin amultimodalapproachformodelingengagementinconversation
AT stephanerauzy amultimodalapproachformodelingengagementinconversation
AT stephanerauzy amultimodalapproachformodelingengagementinconversation
AT philippeblache amultimodalapproachformodelingengagementinconversation
AT philippeblache amultimodalapproachformodelingengagementinconversation
AT arthurpelletrostaing multimodalapproachformodelingengagementinconversation
AT arthurpelletrostaing multimodalapproachformodelingengagementinconversation
AT roxanebertrand multimodalapproachformodelingengagementinconversation
AT roxanebertrand multimodalapproachformodelingengagementinconversation
AT aurianeboudin multimodalapproachformodelingengagementinconversation
AT aurianeboudin multimodalapproachformodelingengagementinconversation
AT stephanerauzy multimodalapproachformodelingengagementinconversation
AT stephanerauzy multimodalapproachformodelingengagementinconversation
AT philippeblache multimodalapproachformodelingengagementinconversation
AT philippeblache multimodalapproachformodelingengagementinconversation