Different facial cues for different speech styles in Mandarin tone articulation

Visual facial information, particularly hyperarticulated lip movements in clear speech, has been shown to benefit segmental speech perception. Little research has focused on prosody, such as lexical tone, presumably because production of prosody primarily involves laryngeal activities not necessaril...

Full description

Bibliographic Details
Main Authors: Saurabh Garg, Ghassan Hamarneh, Joan Sereno, Allard Jongman, Yue Wang
Format: Article
Language:English
Published: Frontiers Media S.A. 2023-04-01
Series:Frontiers in Communication
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fcomm.2023.1148240/full
_version_ 1797837988097425408
author Saurabh Garg
Ghassan Hamarneh
Joan Sereno
Allard Jongman
Yue Wang
author_facet Saurabh Garg
Ghassan Hamarneh
Joan Sereno
Allard Jongman
Yue Wang
author_sort Saurabh Garg
collection DOAJ
description Visual facial information, particularly hyperarticulated lip movements in clear speech, has been shown to benefit segmental speech perception. Little research has focused on prosody, such as lexical tone, presumably because production of prosody primarily involves laryngeal activities not necessarily distinguishable through visible articulatory movements. However, there is evidence that head, eyebrow, and lip movements correlate with production of pitch-related variations. One subsequent question is whether such visual cues are linguistically meaningful. In this study, we compare movements of the head, eyebrows and lips associated with plain (conversational) vs. clear speech styles of Mandarin tone articulation to examine the extent to which clear-speech modifications involve signal-based overall exaggerated facial movements or code-based enhancement of linguistically relevant articulatory movements. Applying computer-vision techniques to recorded speech, visible movements of the frontal face were tracked and measured for 20 native Mandarin speakers speaking in two speech styles: plain and clear. Thirty-three head, eyebrow and lip movement features based on distance, time, and kinematics were extracted from each individual tone word. A random forest classifier was used to identify the important features that differentiate the two styles across tones and for each tone. Mixed-effects models were then performed to determine the features that were significantly different between the two styles. Overall, for all the four Mandarin tones, we found longer duration and greater movements of the head, eyebrows, and lips in clear speech than in plain speech. Additionally, across tones, the maximum movement happened relatively earlier in clear than plain speech. Although limited evidence of tone-specific modifications was also observed, the cues involved overlap with signal-based changes. These findings suggest that visual facial tonal modifications for clear speech primarily adopt signal-based general emphatic cues that strengthen signal saliency.
first_indexed 2024-04-09T15:33:31Z
format Article
id doaj.art-9ecaf6e6628742c2af7a8e8510e2704c
institution Directory Open Access Journal
issn 2297-900X
language English
last_indexed 2024-04-09T15:33:31Z
publishDate 2023-04-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Communication
spelling doaj.art-9ecaf6e6628742c2af7a8e8510e2704c2023-04-28T04:58:57ZengFrontiers Media S.A.Frontiers in Communication2297-900X2023-04-01810.3389/fcomm.2023.11482401148240Different facial cues for different speech styles in Mandarin tone articulationSaurabh Garg0Ghassan Hamarneh1Joan Sereno2Allard Jongman3Yue Wang4Department of Linguistics, Simon Fraser University, Burnaby, BC, CanadaSchool of Computing Science, Simon Fraser University, Burnaby, BC, CanadaDepartment of Linguistics, University of Kansas, Lawrence, KS, United StatesDepartment of Linguistics, University of Kansas, Lawrence, KS, United StatesDepartment of Linguistics, Simon Fraser University, Burnaby, BC, CanadaVisual facial information, particularly hyperarticulated lip movements in clear speech, has been shown to benefit segmental speech perception. Little research has focused on prosody, such as lexical tone, presumably because production of prosody primarily involves laryngeal activities not necessarily distinguishable through visible articulatory movements. However, there is evidence that head, eyebrow, and lip movements correlate with production of pitch-related variations. One subsequent question is whether such visual cues are linguistically meaningful. In this study, we compare movements of the head, eyebrows and lips associated with plain (conversational) vs. clear speech styles of Mandarin tone articulation to examine the extent to which clear-speech modifications involve signal-based overall exaggerated facial movements or code-based enhancement of linguistically relevant articulatory movements. Applying computer-vision techniques to recorded speech, visible movements of the frontal face were tracked and measured for 20 native Mandarin speakers speaking in two speech styles: plain and clear. Thirty-three head, eyebrow and lip movement features based on distance, time, and kinematics were extracted from each individual tone word. A random forest classifier was used to identify the important features that differentiate the two styles across tones and for each tone. Mixed-effects models were then performed to determine the features that were significantly different between the two styles. Overall, for all the four Mandarin tones, we found longer duration and greater movements of the head, eyebrows, and lips in clear speech than in plain speech. Additionally, across tones, the maximum movement happened relatively earlier in clear than plain speech. Although limited evidence of tone-specific modifications was also observed, the cues involved overlap with signal-based changes. These findings suggest that visual facial tonal modifications for clear speech primarily adopt signal-based general emphatic cues that strengthen signal saliency.https://www.frontiersin.org/articles/10.3389/fcomm.2023.1148240/fullspeech styleMandarinfacial cuescomputer visionvideo processingMandarin tones
spellingShingle Saurabh Garg
Ghassan Hamarneh
Joan Sereno
Allard Jongman
Yue Wang
Different facial cues for different speech styles in Mandarin tone articulation
Frontiers in Communication
speech style
Mandarin
facial cues
computer vision
video processing
Mandarin tones
title Different facial cues for different speech styles in Mandarin tone articulation
title_full Different facial cues for different speech styles in Mandarin tone articulation
title_fullStr Different facial cues for different speech styles in Mandarin tone articulation
title_full_unstemmed Different facial cues for different speech styles in Mandarin tone articulation
title_short Different facial cues for different speech styles in Mandarin tone articulation
title_sort different facial cues for different speech styles in mandarin tone articulation
topic speech style
Mandarin
facial cues
computer vision
video processing
Mandarin tones
url https://www.frontiersin.org/articles/10.3389/fcomm.2023.1148240/full
work_keys_str_mv AT saurabhgarg differentfacialcuesfordifferentspeechstylesinmandarintonearticulation
AT ghassanhamarneh differentfacialcuesfordifferentspeechstylesinmandarintonearticulation
AT joansereno differentfacialcuesfordifferentspeechstylesinmandarintonearticulation
AT allardjongman differentfacialcuesfordifferentspeechstylesinmandarintonearticulation
AT yuewang differentfacialcuesfordifferentspeechstylesinmandarintonearticulation