Contribution of Common Modulation Spectral Features to Vocal-Emotion Recognition of Noise-Vocoded Speech in Noisy Reverberant Environments

In one study on vocal emotion recognition using noise-vocoded speech (NVS), the high similarities between modulation spectral features (MSFs) and the results of vocal-emotion-recognition experiments indicated that MSFs contribute to vocal emotion recognition in a clean environment (with no noise and...

Full description

Bibliographic Details
Main Authors: Taiyang Guo, Zhi Zhu, Shunsuke Kidani, Masashi Unoki
Format: Article
Language:English
Published: MDPI AG 2022-10-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/12/19/9979
_version_ 1797480571940634624
author Taiyang Guo
Zhi Zhu
Shunsuke Kidani
Masashi Unoki
author_facet Taiyang Guo
Zhi Zhu
Shunsuke Kidani
Masashi Unoki
author_sort Taiyang Guo
collection DOAJ
description In one study on vocal emotion recognition using noise-vocoded speech (NVS), the high similarities between modulation spectral features (MSFs) and the results of vocal-emotion-recognition experiments indicated that MSFs contribute to vocal emotion recognition in a clean environment (with no noise and no reverberation). Other studies also clarified that vocal emotion recognition using NVS is not affected by noisy reverberant environments (signal-to-noise ratio is greater than 10 dB and reverberation time is less than 1.0 s). However, the contribution of MSFs to vocal emotion recognition in noisy reverberant environments is still unclear. We aimed to clarify whether MSFs can be used to explain the vocal-emotion-recognition results in noisy reverberant environments. We analyzed the results of vocal-emotion-recognition experiments and used an auditory-based modulation filterbank to calculate the modulation spectrograms of NVS. We then extracted ten MSFs as higher-order statistics of modulation spectrograms. As shown from the relationship between MSFs and vocal-emotion-recognition results, except for extremely high noisy reverberant environments, there were high similarities between MSFs and the vocal emotion recognition results in noisy reverberant environments, which indicates that MSFs can be used to explain such results in noisy reverberant environments. We also found that there are two common MSFs (MSKT<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mi>k</mi></msub></semantics></math></inline-formula> (modulation spectral kurtosis) and MSTL<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mi>k</mi></msub></semantics></math></inline-formula> (modulation spectral tilt)) that contribute to vocal emotion recognition in all daily environments.
first_indexed 2024-03-09T22:01:59Z
format Article
id doaj.art-d1e8155857dc4352bcc5181d8ef1a172
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-09T22:01:59Z
publishDate 2022-10-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-d1e8155857dc4352bcc5181d8ef1a1722023-11-23T19:48:55ZengMDPI AGApplied Sciences2076-34172022-10-011219997910.3390/app12199979Contribution of Common Modulation Spectral Features to Vocal-Emotion Recognition of Noise-Vocoded Speech in Noisy Reverberant EnvironmentsTaiyang Guo0Zhi Zhu1Shunsuke Kidani2Masashi Unoki3Japan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi 923-1292, JapanFairy Devices Inc., 7F Yushima Urban Bldg., 2-31-22 Bunkyo-ku, Tokyo 113-0034, JapanJapan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi 923-1292, JapanJapan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi 923-1292, JapanIn one study on vocal emotion recognition using noise-vocoded speech (NVS), the high similarities between modulation spectral features (MSFs) and the results of vocal-emotion-recognition experiments indicated that MSFs contribute to vocal emotion recognition in a clean environment (with no noise and no reverberation). Other studies also clarified that vocal emotion recognition using NVS is not affected by noisy reverberant environments (signal-to-noise ratio is greater than 10 dB and reverberation time is less than 1.0 s). However, the contribution of MSFs to vocal emotion recognition in noisy reverberant environments is still unclear. We aimed to clarify whether MSFs can be used to explain the vocal-emotion-recognition results in noisy reverberant environments. We analyzed the results of vocal-emotion-recognition experiments and used an auditory-based modulation filterbank to calculate the modulation spectrograms of NVS. We then extracted ten MSFs as higher-order statistics of modulation spectrograms. As shown from the relationship between MSFs and vocal-emotion-recognition results, except for extremely high noisy reverberant environments, there were high similarities between MSFs and the vocal emotion recognition results in noisy reverberant environments, which indicates that MSFs can be used to explain such results in noisy reverberant environments. We also found that there are two common MSFs (MSKT<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mi>k</mi></msub></semantics></math></inline-formula> (modulation spectral kurtosis) and MSTL<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mi>k</mi></msub></semantics></math></inline-formula> (modulation spectral tilt)) that contribute to vocal emotion recognition in all daily environments.https://www.mdpi.com/2076-3417/12/19/9979modulation spectral featurevocal emotion recognitionnoise-vocoded speechnoisy reverberant environment
spellingShingle Taiyang Guo
Zhi Zhu
Shunsuke Kidani
Masashi Unoki
Contribution of Common Modulation Spectral Features to Vocal-Emotion Recognition of Noise-Vocoded Speech in Noisy Reverberant Environments
Applied Sciences
modulation spectral feature
vocal emotion recognition
noise-vocoded speech
noisy reverberant environment
title Contribution of Common Modulation Spectral Features to Vocal-Emotion Recognition of Noise-Vocoded Speech in Noisy Reverberant Environments
title_full Contribution of Common Modulation Spectral Features to Vocal-Emotion Recognition of Noise-Vocoded Speech in Noisy Reverberant Environments
title_fullStr Contribution of Common Modulation Spectral Features to Vocal-Emotion Recognition of Noise-Vocoded Speech in Noisy Reverberant Environments
title_full_unstemmed Contribution of Common Modulation Spectral Features to Vocal-Emotion Recognition of Noise-Vocoded Speech in Noisy Reverberant Environments
title_short Contribution of Common Modulation Spectral Features to Vocal-Emotion Recognition of Noise-Vocoded Speech in Noisy Reverberant Environments
title_sort contribution of common modulation spectral features to vocal emotion recognition of noise vocoded speech in noisy reverberant environments
topic modulation spectral feature
vocal emotion recognition
noise-vocoded speech
noisy reverberant environment
url https://www.mdpi.com/2076-3417/12/19/9979
work_keys_str_mv AT taiyangguo contributionofcommonmodulationspectralfeaturestovocalemotionrecognitionofnoisevocodedspeechinnoisyreverberantenvironments
AT zhizhu contributionofcommonmodulationspectralfeaturestovocalemotionrecognitionofnoisevocodedspeechinnoisyreverberantenvironments
AT shunsukekidani contributionofcommonmodulationspectralfeaturestovocalemotionrecognitionofnoisevocodedspeechinnoisyreverberantenvironments
AT masashiunoki contributionofcommonmodulationspectralfeaturestovocalemotionrecognitionofnoisevocodedspeechinnoisyreverberantenvironments