An Improved Chinese Pause Fillers Prediction Module Based on RoBERTa

The prediction of pause fillers plays a crucial role in enhancing the naturalness of synthesized speech. In recent years, neural networks, including LSTM, BERT, and XLNet, have been employed for pause fillers prediction modules. However, these methods have exhibited relatively lower accuracy in pred...

Full description

Bibliographic Details
Main Authors:	Ling Yu, Xiaoqun Zhou, Fanglin Niu
Format:	Article
Language:	English
Published:	MDPI AG 2023-09-01
Series:	Applied Sciences
Subjects:	RoBERTa naturalness of speech speech synthesis Chinese pause fillers prediction module
Online Access:	https://www.mdpi.com/2076-3417/13/19/10652

_version_	1797576237011435520
author	Ling Yu Xiaoqun Zhou Fanglin Niu
author_facet	Ling Yu Xiaoqun Zhou Fanglin Niu
author_sort	Ling Yu
collection	DOAJ
description	The prediction of pause fillers plays a crucial role in enhancing the naturalness of synthesized speech. In recent years, neural networks, including LSTM, BERT, and XLNet, have been employed for pause fillers prediction modules. However, these methods have exhibited relatively lower accuracy in predicting pause fillers. This paper introduces the utilization of the RoBERTa model for predicting Chinese pause fillers and presents a novel approach to training the RoBERTa model, effectively enhancing the accuracy of Chinese pause fillers prediction. Our proposed approach involves categorizing text from different speakers into four distinct style groups based on the frequency and position of Chinese pause fillers. The RoBERTa model is trained on these four groups of data, which incorporate different styles of fillers, thereby ensuring a more natural synthesis of speech. The Chinese pause fillers prediction module is evaluated on systems such as Parallel Tacotron2, FastPitch, and Deep Voice3, achieving a notable 26.7% improvement in word-level prediction accuracy compared to the BERT model, along with a 14% enhancement in position-level prediction accuracy. This substantial improvement results in a significant enhancement of the naturalness of the generated speech.
first_indexed	2024-03-10T21:49:26Z
format	Article
id	doaj.art-42a3860239f3422d8638e327922c8b3e
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-10T21:49:26Z
publishDate	2023-09-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-42a3860239f3422d8638e327922c8b3e2023-11-19T14:02:21ZengMDPI AGApplied Sciences2076-34172023-09-0113191065210.3390/app131910652An Improved Chinese Pause Fillers Prediction Module Based on RoBERTaLing Yu0Xiaoqun Zhou1Fanglin Niu2School of Electronics and Information Engineering, Liaoning University of Technology, Jinzhou 121001, ChinaSchool of Electronics and Information Engineering, Shenyang University of Technology, Shenyang 110000, ChinaSchool of Electronics and Information Engineering, Liaoning University of Technology, Jinzhou 121001, ChinaThe prediction of pause fillers plays a crucial role in enhancing the naturalness of synthesized speech. In recent years, neural networks, including LSTM, BERT, and XLNet, have been employed for pause fillers prediction modules. However, these methods have exhibited relatively lower accuracy in predicting pause fillers. This paper introduces the utilization of the RoBERTa model for predicting Chinese pause fillers and presents a novel approach to training the RoBERTa model, effectively enhancing the accuracy of Chinese pause fillers prediction. Our proposed approach involves categorizing text from different speakers into four distinct style groups based on the frequency and position of Chinese pause fillers. The RoBERTa model is trained on these four groups of data, which incorporate different styles of fillers, thereby ensuring a more natural synthesis of speech. The Chinese pause fillers prediction module is evaluated on systems such as Parallel Tacotron2, FastPitch, and Deep Voice3, achieving a notable 26.7% improvement in word-level prediction accuracy compared to the BERT model, along with a 14% enhancement in position-level prediction accuracy. This substantial improvement results in a significant enhancement of the naturalness of the generated speech.https://www.mdpi.com/2076-3417/13/19/10652RoBERTanaturalness of speechspeech synthesisChinese pause fillersprediction module
spellingShingle	Ling Yu Xiaoqun Zhou Fanglin Niu An Improved Chinese Pause Fillers Prediction Module Based on RoBERTa Applied Sciences RoBERTa naturalness of speech speech synthesis Chinese pause fillers prediction module
title	An Improved Chinese Pause Fillers Prediction Module Based on RoBERTa
title_full	An Improved Chinese Pause Fillers Prediction Module Based on RoBERTa
title_fullStr	An Improved Chinese Pause Fillers Prediction Module Based on RoBERTa
title_full_unstemmed	An Improved Chinese Pause Fillers Prediction Module Based on RoBERTa
title_short	An Improved Chinese Pause Fillers Prediction Module Based on RoBERTa
title_sort	improved chinese pause fillers prediction module based on roberta
topic	RoBERTa naturalness of speech speech synthesis Chinese pause fillers prediction module
url	https://www.mdpi.com/2076-3417/13/19/10652
work_keys_str_mv	AT lingyu animprovedchinesepausefillerspredictionmodulebasedonroberta AT xiaoqunzhou animprovedchinesepausefillerspredictionmodulebasedonroberta AT fanglinniu animprovedchinesepausefillerspredictionmodulebasedonroberta AT lingyu improvedchinesepausefillerspredictionmodulebasedonroberta AT xiaoqunzhou improvedchinesepausefillerspredictionmodulebasedonroberta AT fanglinniu improvedchinesepausefillerspredictionmodulebasedonroberta

An Improved Chinese Pause Fillers Prediction Module Based on RoBERTa

Similar Items