An Improved Chinese Pause Fillers Prediction Module Based on RoBERTa

The prediction of pause fillers plays a crucial role in enhancing the naturalness of synthesized speech. In recent years, neural networks, including LSTM, BERT, and XLNet, have been employed for pause fillers prediction modules. However, these methods have exhibited relatively lower accuracy in pred...

Full description

Bibliographic Details
Main Authors: Ling Yu, Xiaoqun Zhou, Fanglin Niu
Format: Article
Language:English
Published: MDPI AG 2023-09-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/13/19/10652
_version_ 1797576237011435520
author Ling Yu
Xiaoqun Zhou
Fanglin Niu
author_facet Ling Yu
Xiaoqun Zhou
Fanglin Niu
author_sort Ling Yu
collection DOAJ
description The prediction of pause fillers plays a crucial role in enhancing the naturalness of synthesized speech. In recent years, neural networks, including LSTM, BERT, and XLNet, have been employed for pause fillers prediction modules. However, these methods have exhibited relatively lower accuracy in predicting pause fillers. This paper introduces the utilization of the RoBERTa model for predicting Chinese pause fillers and presents a novel approach to training the RoBERTa model, effectively enhancing the accuracy of Chinese pause fillers prediction. Our proposed approach involves categorizing text from different speakers into four distinct style groups based on the frequency and position of Chinese pause fillers. The RoBERTa model is trained on these four groups of data, which incorporate different styles of fillers, thereby ensuring a more natural synthesis of speech. The Chinese pause fillers prediction module is evaluated on systems such as Parallel Tacotron2, FastPitch, and Deep Voice3, achieving a notable 26.7% improvement in word-level prediction accuracy compared to the BERT model, along with a 14% enhancement in position-level prediction accuracy. This substantial improvement results in a significant enhancement of the naturalness of the generated speech.
first_indexed 2024-03-10T21:49:26Z
format Article
id doaj.art-42a3860239f3422d8638e327922c8b3e
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-10T21:49:26Z
publishDate 2023-09-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-42a3860239f3422d8638e327922c8b3e2023-11-19T14:02:21ZengMDPI AGApplied Sciences2076-34172023-09-0113191065210.3390/app131910652An Improved Chinese Pause Fillers Prediction Module Based on RoBERTaLing Yu0Xiaoqun Zhou1Fanglin Niu2School of Electronics and Information Engineering, Liaoning University of Technology, Jinzhou 121001, ChinaSchool of Electronics and Information Engineering, Shenyang University of Technology, Shenyang 110000, ChinaSchool of Electronics and Information Engineering, Liaoning University of Technology, Jinzhou 121001, ChinaThe prediction of pause fillers plays a crucial role in enhancing the naturalness of synthesized speech. In recent years, neural networks, including LSTM, BERT, and XLNet, have been employed for pause fillers prediction modules. However, these methods have exhibited relatively lower accuracy in predicting pause fillers. This paper introduces the utilization of the RoBERTa model for predicting Chinese pause fillers and presents a novel approach to training the RoBERTa model, effectively enhancing the accuracy of Chinese pause fillers prediction. Our proposed approach involves categorizing text from different speakers into four distinct style groups based on the frequency and position of Chinese pause fillers. The RoBERTa model is trained on these four groups of data, which incorporate different styles of fillers, thereby ensuring a more natural synthesis of speech. The Chinese pause fillers prediction module is evaluated on systems such as Parallel Tacotron2, FastPitch, and Deep Voice3, achieving a notable 26.7% improvement in word-level prediction accuracy compared to the BERT model, along with a 14% enhancement in position-level prediction accuracy. This substantial improvement results in a significant enhancement of the naturalness of the generated speech.https://www.mdpi.com/2076-3417/13/19/10652RoBERTanaturalness of speechspeech synthesisChinese pause fillersprediction module
spellingShingle Ling Yu
Xiaoqun Zhou
Fanglin Niu
An Improved Chinese Pause Fillers Prediction Module Based on RoBERTa
Applied Sciences
RoBERTa
naturalness of speech
speech synthesis
Chinese pause fillers
prediction module
title An Improved Chinese Pause Fillers Prediction Module Based on RoBERTa
title_full An Improved Chinese Pause Fillers Prediction Module Based on RoBERTa
title_fullStr An Improved Chinese Pause Fillers Prediction Module Based on RoBERTa
title_full_unstemmed An Improved Chinese Pause Fillers Prediction Module Based on RoBERTa
title_short An Improved Chinese Pause Fillers Prediction Module Based on RoBERTa
title_sort improved chinese pause fillers prediction module based on roberta
topic RoBERTa
naturalness of speech
speech synthesis
Chinese pause fillers
prediction module
url https://www.mdpi.com/2076-3417/13/19/10652
work_keys_str_mv AT lingyu animprovedchinesepausefillerspredictionmodulebasedonroberta
AT xiaoqunzhou animprovedchinesepausefillerspredictionmodulebasedonroberta
AT fanglinniu animprovedchinesepausefillerspredictionmodulebasedonroberta
AT lingyu improvedchinesepausefillerspredictionmodulebasedonroberta
AT xiaoqunzhou improvedchinesepausefillerspredictionmodulebasedonroberta
AT fanglinniu improvedchinesepausefillerspredictionmodulebasedonroberta