Different facial cues for different speech styles in Mandarin tone articulation
Visual facial information, particularly hyperarticulated lip movements in clear speech, has been shown to benefit segmental speech perception. Little research has focused on prosody, such as lexical tone, presumably because production of prosody primarily involves laryngeal activities not necessaril...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2023-04-01
|
Series: | Frontiers in Communication |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fcomm.2023.1148240/full |
_version_ | 1797837988097425408 |
---|---|
author | Saurabh Garg Ghassan Hamarneh Joan Sereno Allard Jongman Yue Wang |
author_facet | Saurabh Garg Ghassan Hamarneh Joan Sereno Allard Jongman Yue Wang |
author_sort | Saurabh Garg |
collection | DOAJ |
description | Visual facial information, particularly hyperarticulated lip movements in clear speech, has been shown to benefit segmental speech perception. Little research has focused on prosody, such as lexical tone, presumably because production of prosody primarily involves laryngeal activities not necessarily distinguishable through visible articulatory movements. However, there is evidence that head, eyebrow, and lip movements correlate with production of pitch-related variations. One subsequent question is whether such visual cues are linguistically meaningful. In this study, we compare movements of the head, eyebrows and lips associated with plain (conversational) vs. clear speech styles of Mandarin tone articulation to examine the extent to which clear-speech modifications involve signal-based overall exaggerated facial movements or code-based enhancement of linguistically relevant articulatory movements. Applying computer-vision techniques to recorded speech, visible movements of the frontal face were tracked and measured for 20 native Mandarin speakers speaking in two speech styles: plain and clear. Thirty-three head, eyebrow and lip movement features based on distance, time, and kinematics were extracted from each individual tone word. A random forest classifier was used to identify the important features that differentiate the two styles across tones and for each tone. Mixed-effects models were then performed to determine the features that were significantly different between the two styles. Overall, for all the four Mandarin tones, we found longer duration and greater movements of the head, eyebrows, and lips in clear speech than in plain speech. Additionally, across tones, the maximum movement happened relatively earlier in clear than plain speech. Although limited evidence of tone-specific modifications was also observed, the cues involved overlap with signal-based changes. These findings suggest that visual facial tonal modifications for clear speech primarily adopt signal-based general emphatic cues that strengthen signal saliency. |
first_indexed | 2024-04-09T15:33:31Z |
format | Article |
id | doaj.art-9ecaf6e6628742c2af7a8e8510e2704c |
institution | Directory Open Access Journal |
issn | 2297-900X |
language | English |
last_indexed | 2024-04-09T15:33:31Z |
publishDate | 2023-04-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Communication |
spelling | doaj.art-9ecaf6e6628742c2af7a8e8510e2704c2023-04-28T04:58:57ZengFrontiers Media S.A.Frontiers in Communication2297-900X2023-04-01810.3389/fcomm.2023.11482401148240Different facial cues for different speech styles in Mandarin tone articulationSaurabh Garg0Ghassan Hamarneh1Joan Sereno2Allard Jongman3Yue Wang4Department of Linguistics, Simon Fraser University, Burnaby, BC, CanadaSchool of Computing Science, Simon Fraser University, Burnaby, BC, CanadaDepartment of Linguistics, University of Kansas, Lawrence, KS, United StatesDepartment of Linguistics, University of Kansas, Lawrence, KS, United StatesDepartment of Linguistics, Simon Fraser University, Burnaby, BC, CanadaVisual facial information, particularly hyperarticulated lip movements in clear speech, has been shown to benefit segmental speech perception. Little research has focused on prosody, such as lexical tone, presumably because production of prosody primarily involves laryngeal activities not necessarily distinguishable through visible articulatory movements. However, there is evidence that head, eyebrow, and lip movements correlate with production of pitch-related variations. One subsequent question is whether such visual cues are linguistically meaningful. In this study, we compare movements of the head, eyebrows and lips associated with plain (conversational) vs. clear speech styles of Mandarin tone articulation to examine the extent to which clear-speech modifications involve signal-based overall exaggerated facial movements or code-based enhancement of linguistically relevant articulatory movements. Applying computer-vision techniques to recorded speech, visible movements of the frontal face were tracked and measured for 20 native Mandarin speakers speaking in two speech styles: plain and clear. Thirty-three head, eyebrow and lip movement features based on distance, time, and kinematics were extracted from each individual tone word. A random forest classifier was used to identify the important features that differentiate the two styles across tones and for each tone. Mixed-effects models were then performed to determine the features that were significantly different between the two styles. Overall, for all the four Mandarin tones, we found longer duration and greater movements of the head, eyebrows, and lips in clear speech than in plain speech. Additionally, across tones, the maximum movement happened relatively earlier in clear than plain speech. Although limited evidence of tone-specific modifications was also observed, the cues involved overlap with signal-based changes. These findings suggest that visual facial tonal modifications for clear speech primarily adopt signal-based general emphatic cues that strengthen signal saliency.https://www.frontiersin.org/articles/10.3389/fcomm.2023.1148240/fullspeech styleMandarinfacial cuescomputer visionvideo processingMandarin tones |
spellingShingle | Saurabh Garg Ghassan Hamarneh Joan Sereno Allard Jongman Yue Wang Different facial cues for different speech styles in Mandarin tone articulation Frontiers in Communication speech style Mandarin facial cues computer vision video processing Mandarin tones |
title | Different facial cues for different speech styles in Mandarin tone articulation |
title_full | Different facial cues for different speech styles in Mandarin tone articulation |
title_fullStr | Different facial cues for different speech styles in Mandarin tone articulation |
title_full_unstemmed | Different facial cues for different speech styles in Mandarin tone articulation |
title_short | Different facial cues for different speech styles in Mandarin tone articulation |
title_sort | different facial cues for different speech styles in mandarin tone articulation |
topic | speech style Mandarin facial cues computer vision video processing Mandarin tones |
url | https://www.frontiersin.org/articles/10.3389/fcomm.2023.1148240/full |
work_keys_str_mv | AT saurabhgarg differentfacialcuesfordifferentspeechstylesinmandarintonearticulation AT ghassanhamarneh differentfacialcuesfordifferentspeechstylesinmandarintonearticulation AT joansereno differentfacialcuesfordifferentspeechstylesinmandarintonearticulation AT allardjongman differentfacialcuesfordifferentspeechstylesinmandarintonearticulation AT yuewang differentfacialcuesfordifferentspeechstylesinmandarintonearticulation |