Different facial cues for different speech styles in Mandarin tone articulation

Visual facial information, particularly hyperarticulated lip movements in clear speech, has been shown to benefit segmental speech perception. Little research has focused on prosody, such as lexical tone, presumably because production of prosody primarily involves laryngeal activities not necessaril...

Full description

Bibliographic Details
Main Authors:	Saurabh Garg, Ghassan Hamarneh, Joan Sereno, Allard Jongman, Yue Wang
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2023-04-01
Series:	Frontiers in Communication
Subjects:	speech style Mandarin facial cues computer vision video processing Mandarin tones
Online Access:	https://www.frontiersin.org/articles/10.3389/fcomm.2023.1148240/full

_version_	1797837988097425408
author	Saurabh Garg Ghassan Hamarneh Joan Sereno Allard Jongman Yue Wang
author_facet	Saurabh Garg Ghassan Hamarneh Joan Sereno Allard Jongman Yue Wang
author_sort	Saurabh Garg
collection	DOAJ
description	Visual facial information, particularly hyperarticulated lip movements in clear speech, has been shown to benefit segmental speech perception. Little research has focused on prosody, such as lexical tone, presumably because production of prosody primarily involves laryngeal activities not necessarily distinguishable through visible articulatory movements. However, there is evidence that head, eyebrow, and lip movements correlate with production of pitch-related variations. One subsequent question is whether such visual cues are linguistically meaningful. In this study, we compare movements of the head, eyebrows and lips associated with plain (conversational) vs. clear speech styles of Mandarin tone articulation to examine the extent to which clear-speech modifications involve signal-based overall exaggerated facial movements or code-based enhancement of linguistically relevant articulatory movements. Applying computer-vision techniques to recorded speech, visible movements of the frontal face were tracked and measured for 20 native Mandarin speakers speaking in two speech styles: plain and clear. Thirty-three head, eyebrow and lip movement features based on distance, time, and kinematics were extracted from each individual tone word. A random forest classifier was used to identify the important features that differentiate the two styles across tones and for each tone. Mixed-effects models were then performed to determine the features that were significantly different between the two styles. Overall, for all the four Mandarin tones, we found longer duration and greater movements of the head, eyebrows, and lips in clear speech than in plain speech. Additionally, across tones, the maximum movement happened relatively earlier in clear than plain speech. Although limited evidence of tone-specific modifications was also observed, the cues involved overlap with signal-based changes. These findings suggest that visual facial tonal modifications for clear speech primarily adopt signal-based general emphatic cues that strengthen signal saliency.
first_indexed	2024-04-09T15:33:31Z
format	Article
id	doaj.art-9ecaf6e6628742c2af7a8e8510e2704c
institution	Directory Open Access Journal
issn	2297-900X
language	English
last_indexed	2024-04-09T15:33:31Z
publishDate	2023-04-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in Communication
spelling	doaj.art-9ecaf6e6628742c2af7a8e8510e2704c2023-04-28T04:58:57ZengFrontiers Media S.A.Frontiers in Communication2297-900X2023-04-01810.3389/fcomm.2023.11482401148240Different facial cues for different speech styles in Mandarin tone articulationSaurabh Garg0Ghassan Hamarneh1Joan Sereno2Allard Jongman3Yue Wang4Department of Linguistics, Simon Fraser University, Burnaby, BC, CanadaSchool of Computing Science, Simon Fraser University, Burnaby, BC, CanadaDepartment of Linguistics, University of Kansas, Lawrence, KS, United StatesDepartment of Linguistics, University of Kansas, Lawrence, KS, United StatesDepartment of Linguistics, Simon Fraser University, Burnaby, BC, CanadaVisual facial information, particularly hyperarticulated lip movements in clear speech, has been shown to benefit segmental speech perception. Little research has focused on prosody, such as lexical tone, presumably because production of prosody primarily involves laryngeal activities not necessarily distinguishable through visible articulatory movements. However, there is evidence that head, eyebrow, and lip movements correlate with production of pitch-related variations. One subsequent question is whether such visual cues are linguistically meaningful. In this study, we compare movements of the head, eyebrows and lips associated with plain (conversational) vs. clear speech styles of Mandarin tone articulation to examine the extent to which clear-speech modifications involve signal-based overall exaggerated facial movements or code-based enhancement of linguistically relevant articulatory movements. Applying computer-vision techniques to recorded speech, visible movements of the frontal face were tracked and measured for 20 native Mandarin speakers speaking in two speech styles: plain and clear. Thirty-three head, eyebrow and lip movement features based on distance, time, and kinematics were extracted from each individual tone word. A random forest classifier was used to identify the important features that differentiate the two styles across tones and for each tone. Mixed-effects models were then performed to determine the features that were significantly different between the two styles. Overall, for all the four Mandarin tones, we found longer duration and greater movements of the head, eyebrows, and lips in clear speech than in plain speech. Additionally, across tones, the maximum movement happened relatively earlier in clear than plain speech. Although limited evidence of tone-specific modifications was also observed, the cues involved overlap with signal-based changes. These findings suggest that visual facial tonal modifications for clear speech primarily adopt signal-based general emphatic cues that strengthen signal saliency.https://www.frontiersin.org/articles/10.3389/fcomm.2023.1148240/fullspeech styleMandarinfacial cuescomputer visionvideo processingMandarin tones
spellingShingle	Saurabh Garg Ghassan Hamarneh Joan Sereno Allard Jongman Yue Wang Different facial cues for different speech styles in Mandarin tone articulation Frontiers in Communication speech style Mandarin facial cues computer vision video processing Mandarin tones
title	Different facial cues for different speech styles in Mandarin tone articulation
title_full	Different facial cues for different speech styles in Mandarin tone articulation
title_fullStr	Different facial cues for different speech styles in Mandarin tone articulation
title_full_unstemmed	Different facial cues for different speech styles in Mandarin tone articulation
title_short	Different facial cues for different speech styles in Mandarin tone articulation
title_sort	different facial cues for different speech styles in mandarin tone articulation
topic	speech style Mandarin facial cues computer vision video processing Mandarin tones
url	https://www.frontiersin.org/articles/10.3389/fcomm.2023.1148240/full
work_keys_str_mv	AT saurabhgarg differentfacialcuesfordifferentspeechstylesinmandarintonearticulation AT ghassanhamarneh differentfacialcuesfordifferentspeechstylesinmandarintonearticulation AT joansereno differentfacialcuesfordifferentspeechstylesinmandarintonearticulation AT allardjongman differentfacialcuesfordifferentspeechstylesinmandarintonearticulation AT yuewang differentfacialcuesfordifferentspeechstylesinmandarintonearticulation

Different facial cues for different speech styles in Mandarin tone articulation

Similar Items