A digital “flat affect”? Popular speech compression codecs and their effects on emotional prosody

IntroductionCalls via video apps, mobile phones and similar digital channels are a rapidly growing form of speech communication. Such calls are not only— and perhaps less and less— about exchanging content, but about creating, maintaining, and expanding social and business networks. In the phonetic...

Full description

Bibliographic Details
Main Authors: Oliver Niebuhr, Ingo Siegert
Format: Article
Language:English
Published: Frontiers Media S.A. 2023-03-01
Series:Frontiers in Communication
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fcomm.2023.972182/full
_version_ 1797862077344251904
author Oliver Niebuhr
Ingo Siegert
author_facet Oliver Niebuhr
Ingo Siegert
author_sort Oliver Niebuhr
collection DOAJ
description IntroductionCalls via video apps, mobile phones and similar digital channels are a rapidly growing form of speech communication. Such calls are not only— and perhaps less and less— about exchanging content, but about creating, maintaining, and expanding social and business networks. In the phonetic code of speech, these social and emotional signals are considerably shaped by (or encoded in) prosody. However, according to previous studies, it is precisely this prosody that is significantly distorted by modern compression codecs. As a result, the identification of emotions becomes blurred and can even be lost to the extent that opposing emotions like joy and anger or disgust and sadness are no longer differentiated on the recipients' side. The present study searches for the acoustic origins of these perceptual findings.MethodA set of 108 sentences from the Berlin Database of Emotional Speech served as speech material in our study. The sentences were realized by professional actors (2m, 2f) with seven different emotions (neutral, fear, disgust, joy, boredom, anger, sadness) and acoustically analyzed in the original uncompressed (WAV) version and as well as in strongly compressed versions based on the four popular codecs AMR-WB, MP3, OPUS, and SPEEX. The analysis included 6 tonal (i.e. f0-related) and 7 non-tonal prosodic parameters (e.g., formants as well as acoustic-energy and spectral-slope estimates).ResultsResults show significant, codec-specific distortion effects on all 13 prosodic parameter measurements compared to the WAV reference condition. Means values of automatic measurement can, across sentences, deviate by up to 20% from the values of the WAV reference condition. Moreover, the effects go in opposite directions for tonal and non-tonal parameters. While tonal parameters are distorted by speech compression such that the acoustic differences between emotions are increased, compressing non-tonal parameters make the acoustic-prosodic profiles of emotions more similar to each other, particularly under MP3 and SPEEX compression.DiscussionThe term “flat affect” comes from the medical field and describes a person's inability to express or display emotions. So, does strong compression of emotional speech create a “digital flat affect”? The answer to this question is a conditional “yes”. We provided clear evidence for a “digital flat affect”. However, it seems less strongly pronounced in the present acoustic measurements than in previous perception data, and it manifests itself more strongly in non-tonal than in tonal parameters. We discuss the practical implications of our findings for the everyday use of digital communication devices and critically reflect on the generalizability of our findings, also with respect to their origins in the codecs' inner mechanics.
first_indexed 2024-04-09T22:13:26Z
format Article
id doaj.art-f5664db49ae545d69a8a2ee4b7292212
institution Directory Open Access Journal
issn 2297-900X
language English
last_indexed 2024-04-09T22:13:26Z
publishDate 2023-03-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Communication
spelling doaj.art-f5664db49ae545d69a8a2ee4b72922122023-03-23T06:20:06ZengFrontiers Media S.A.Frontiers in Communication2297-900X2023-03-01810.3389/fcomm.2023.972182972182A digital “flat affect”? Popular speech compression codecs and their effects on emotional prosodyOliver Niebuhr0Ingo Siegert1Centre for Industrial Electronics, University of Southern Denmark, Sonderborg, DenmarkMobile Dialog Systems, Institute for Information Technology and Communication, Otto von Guericke University Magdeburg, Magdeburg, GermanyIntroductionCalls via video apps, mobile phones and similar digital channels are a rapidly growing form of speech communication. Such calls are not only— and perhaps less and less— about exchanging content, but about creating, maintaining, and expanding social and business networks. In the phonetic code of speech, these social and emotional signals are considerably shaped by (or encoded in) prosody. However, according to previous studies, it is precisely this prosody that is significantly distorted by modern compression codecs. As a result, the identification of emotions becomes blurred and can even be lost to the extent that opposing emotions like joy and anger or disgust and sadness are no longer differentiated on the recipients' side. The present study searches for the acoustic origins of these perceptual findings.MethodA set of 108 sentences from the Berlin Database of Emotional Speech served as speech material in our study. The sentences were realized by professional actors (2m, 2f) with seven different emotions (neutral, fear, disgust, joy, boredom, anger, sadness) and acoustically analyzed in the original uncompressed (WAV) version and as well as in strongly compressed versions based on the four popular codecs AMR-WB, MP3, OPUS, and SPEEX. The analysis included 6 tonal (i.e. f0-related) and 7 non-tonal prosodic parameters (e.g., formants as well as acoustic-energy and spectral-slope estimates).ResultsResults show significant, codec-specific distortion effects on all 13 prosodic parameter measurements compared to the WAV reference condition. Means values of automatic measurement can, across sentences, deviate by up to 20% from the values of the WAV reference condition. Moreover, the effects go in opposite directions for tonal and non-tonal parameters. While tonal parameters are distorted by speech compression such that the acoustic differences between emotions are increased, compressing non-tonal parameters make the acoustic-prosodic profiles of emotions more similar to each other, particularly under MP3 and SPEEX compression.DiscussionThe term “flat affect” comes from the medical field and describes a person's inability to express or display emotions. So, does strong compression of emotional speech create a “digital flat affect”? The answer to this question is a conditional “yes”. We provided clear evidence for a “digital flat affect”. However, it seems less strongly pronounced in the present acoustic measurements than in previous perception data, and it manifests itself more strongly in non-tonal than in tonal parameters. We discuss the practical implications of our findings for the everyday use of digital communication devices and critically reflect on the generalizability of our findings, also with respect to their origins in the codecs' inner mechanics.https://www.frontiersin.org/articles/10.3389/fcomm.2023.972182/fullprosodycoded speechemotional speechphoneticsdistant meetingsdegraded speech
spellingShingle Oliver Niebuhr
Ingo Siegert
A digital “flat affect”? Popular speech compression codecs and their effects on emotional prosody
Frontiers in Communication
prosody
coded speech
emotional speech
phonetics
distant meetings
degraded speech
title A digital “flat affect”? Popular speech compression codecs and their effects on emotional prosody
title_full A digital “flat affect”? Popular speech compression codecs and their effects on emotional prosody
title_fullStr A digital “flat affect”? Popular speech compression codecs and their effects on emotional prosody
title_full_unstemmed A digital “flat affect”? Popular speech compression codecs and their effects on emotional prosody
title_short A digital “flat affect”? Popular speech compression codecs and their effects on emotional prosody
title_sort digital flat affect popular speech compression codecs and their effects on emotional prosody
topic prosody
coded speech
emotional speech
phonetics
distant meetings
degraded speech
url https://www.frontiersin.org/articles/10.3389/fcomm.2023.972182/full
work_keys_str_mv AT oliverniebuhr adigitalflataffectpopularspeechcompressioncodecsandtheireffectsonemotionalprosody
AT ingosiegert adigitalflataffectpopularspeechcompressioncodecsandtheireffectsonemotionalprosody
AT oliverniebuhr digitalflataffectpopularspeechcompressioncodecsandtheireffectsonemotionalprosody
AT ingosiegert digitalflataffectpopularspeechcompressioncodecsandtheireffectsonemotionalprosody