Whispered Speech Conversion Based on the Inversion of Mel Frequency Cepstral Coefficient Features

A conversion method based on the inversion of Mel frequency cepstral coefficient (MFCC) features was proposed to convert whispered speech into normal speech. First, the MFCC features of whispered speech and normal speech were extracted and a matching relation between the MFCC feature parameters of w...

Full description

Bibliographic Details
Main Authors: Qiang Zhu, Zhong Wang, Yunfeng Dou, Jian Zhou
Format: Article
Language:English
Published: MDPI AG 2022-02-01
Series:Algorithms
Subjects:
Online Access:https://www.mdpi.com/1999-4893/15/2/68
_version_ 1797483510129229824
author Qiang Zhu
Zhong Wang
Yunfeng Dou
Jian Zhou
author_facet Qiang Zhu
Zhong Wang
Yunfeng Dou
Jian Zhou
author_sort Qiang Zhu
collection DOAJ
description A conversion method based on the inversion of Mel frequency cepstral coefficient (MFCC) features was proposed to convert whispered speech into normal speech. First, the MFCC features of whispered speech and normal speech were extracted and a matching relation between the MFCC feature parameters of whispered speech and normal speech was developed through the Gaussian mixture model (GMM). Then, the MFCC feature parameters of normal speech corresponding to whispered speech were obtained based on the GMM and, finally, whispered speech was converted into normal speech through the inversion of MFCC features. The experimental results showed that the cepstral distortion (CD) of the normal speech converted by the proposed method was 21% less than that of the normal speech converted by the linear predictive coefficient (LPC) features, the mean opinion score (MOS) was 3.56, and a satisfactory outcome in both intelligibility and sound quality was achieved.
first_indexed 2024-03-09T22:48:04Z
format Article
id doaj.art-0b75b1d0d98342de857df9bfd72bf7fa
institution Directory Open Access Journal
issn 1999-4893
language English
last_indexed 2024-03-09T22:48:04Z
publishDate 2022-02-01
publisher MDPI AG
record_format Article
series Algorithms
spelling doaj.art-0b75b1d0d98342de857df9bfd72bf7fa2023-11-23T18:24:31ZengMDPI AGAlgorithms1999-48932022-02-011526810.3390/a15020068Whispered Speech Conversion Based on the Inversion of Mel Frequency Cepstral Coefficient FeaturesQiang Zhu0Zhong Wang1Yunfeng Dou2Jian Zhou3School of Computer Science and Technology, Hefei Normal University, Hefei 230601, ChinaSchool of Computer Science and Technology, Hefei Normal University, Hefei 230601, ChinaSchool of Computer Science and Technology, Anhui University, Hefei 230601, ChinaSchool of Computer Science and Technology, Anhui University, Hefei 230601, ChinaA conversion method based on the inversion of Mel frequency cepstral coefficient (MFCC) features was proposed to convert whispered speech into normal speech. First, the MFCC features of whispered speech and normal speech were extracted and a matching relation between the MFCC feature parameters of whispered speech and normal speech was developed through the Gaussian mixture model (GMM). Then, the MFCC feature parameters of normal speech corresponding to whispered speech were obtained based on the GMM and, finally, whispered speech was converted into normal speech through the inversion of MFCC features. The experimental results showed that the cepstral distortion (CD) of the normal speech converted by the proposed method was 21% less than that of the normal speech converted by the linear predictive coefficient (LPC) features, the mean opinion score (MOS) was 3.56, and a satisfactory outcome in both intelligibility and sound quality was achieved.https://www.mdpi.com/1999-4893/15/2/68whispered speech conversionMFCC feature inversionGaussian mixture modelcepstral distortion
spellingShingle Qiang Zhu
Zhong Wang
Yunfeng Dou
Jian Zhou
Whispered Speech Conversion Based on the Inversion of Mel Frequency Cepstral Coefficient Features
Algorithms
whispered speech conversion
MFCC feature inversion
Gaussian mixture model
cepstral distortion
title Whispered Speech Conversion Based on the Inversion of Mel Frequency Cepstral Coefficient Features
title_full Whispered Speech Conversion Based on the Inversion of Mel Frequency Cepstral Coefficient Features
title_fullStr Whispered Speech Conversion Based on the Inversion of Mel Frequency Cepstral Coefficient Features
title_full_unstemmed Whispered Speech Conversion Based on the Inversion of Mel Frequency Cepstral Coefficient Features
title_short Whispered Speech Conversion Based on the Inversion of Mel Frequency Cepstral Coefficient Features
title_sort whispered speech conversion based on the inversion of mel frequency cepstral coefficient features
topic whispered speech conversion
MFCC feature inversion
Gaussian mixture model
cepstral distortion
url https://www.mdpi.com/1999-4893/15/2/68
work_keys_str_mv AT qiangzhu whisperedspeechconversionbasedontheinversionofmelfrequencycepstralcoefficientfeatures
AT zhongwang whisperedspeechconversionbasedontheinversionofmelfrequencycepstralcoefficientfeatures
AT yunfengdou whisperedspeechconversionbasedontheinversionofmelfrequencycepstralcoefficientfeatures
AT jianzhou whisperedspeechconversionbasedontheinversionofmelfrequencycepstralcoefficientfeatures