Automatic Speech Disfluency Detection Using wav2vec2.0 for Different Languages with Variable Lengths

Speech is critical for interpersonal communication, but not everyone has fluent communication skills. Speech disfluency, including stuttering and interruptions, affects not only emotional expression but also clarity of expression for people who stutter. Existing methods for detecting speech disfluen...

Full description

Bibliographic Details
Main Authors:	Jiajun Liu, Aishan Wumaier, Dongping Wei, Shen Guo
Format:	Article
Language:	English
Published:	MDPI AG 2023-06-01
Series:	Applied Sciences
Subjects:	speech disfluency detection stuttering limited data wav2vec2.0 entropy invariance
Online Access:	https://www.mdpi.com/2076-3417/13/13/7579

_version_	1797592132110778368
author	Jiajun Liu Aishan Wumaier Dongping Wei Shen Guo
author_facet	Jiajun Liu Aishan Wumaier Dongping Wei Shen Guo
author_sort	Jiajun Liu
collection	DOAJ
description	Speech is critical for interpersonal communication, but not everyone has fluent communication skills. Speech disfluency, including stuttering and interruptions, affects not only emotional expression but also clarity of expression for people who stutter. Existing methods for detecting speech disfluency rely heavily on annotated data, which can be costly. Additionally, these methods have not considered the issue of variable-length disfluent speech, which limits the scalability of detection methods. To address these limitations, this paper proposes an automated method for detecting speech disfluency that can improve communication skills for individuals and assist therapists in tracking the progress of stuttering patients. The proposed method focuses on detecting four types of disfluency features using single-task detection and utilizes embeddings from the pre-trained wav2vec2.0 model, as well as convolutional neural network (CNN) and Transformer models for feature extraction. The model’s scalability is improved by considering the issue of variable-length disfluent speech and modifying the model based on the entropy invariance of attention mechanisms. The proposed automated method for detecting speech disfluency has the potential to assist individuals in overcoming speech disfluency, improve their communication skills, and aid therapists in tracking the progress of stuttering patients. Additionally, the model’s scalability across languages and lengths enhances its practical applicability. The experiments demonstrate that the model outperforms baseline models in both English and Chinese datasets, proving its universality and scalability in real-world applications.
first_indexed	2024-03-11T01:47:08Z
format	Article
id	doaj.art-75295d9bff2143f6b84c8c34737b3759
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-11T01:47:08Z
publishDate	2023-06-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-75295d9bff2143f6b84c8c34737b37592023-11-18T16:08:26ZengMDPI AGApplied Sciences2076-34172023-06-011313757910.3390/app13137579Automatic Speech Disfluency Detection Using wav2vec2.0 for Different Languages with Variable LengthsJiajun Liu0Aishan Wumaier1Dongping Wei2Shen Guo3College of Software, Xinjiang University, Urumqi 830046, ChinaKey Laboratory of Multilingual Information Technology in Xinjiang Uyghur Autonomous Region, Urumqi 830046, ChinaKey Laboratory of Multilingual Information Technology in Xinjiang Uyghur Autonomous Region, Urumqi 830046, ChinaKey Laboratory of Multilingual Information Technology in Xinjiang Uyghur Autonomous Region, Urumqi 830046, ChinaSpeech is critical for interpersonal communication, but not everyone has fluent communication skills. Speech disfluency, including stuttering and interruptions, affects not only emotional expression but also clarity of expression for people who stutter. Existing methods for detecting speech disfluency rely heavily on annotated data, which can be costly. Additionally, these methods have not considered the issue of variable-length disfluent speech, which limits the scalability of detection methods. To address these limitations, this paper proposes an automated method for detecting speech disfluency that can improve communication skills for individuals and assist therapists in tracking the progress of stuttering patients. The proposed method focuses on detecting four types of disfluency features using single-task detection and utilizes embeddings from the pre-trained wav2vec2.0 model, as well as convolutional neural network (CNN) and Transformer models for feature extraction. The model’s scalability is improved by considering the issue of variable-length disfluent speech and modifying the model based on the entropy invariance of attention mechanisms. The proposed automated method for detecting speech disfluency has the potential to assist individuals in overcoming speech disfluency, improve their communication skills, and aid therapists in tracking the progress of stuttering patients. Additionally, the model’s scalability across languages and lengths enhances its practical applicability. The experiments demonstrate that the model outperforms baseline models in both English and Chinese datasets, proving its universality and scalability in real-world applications.https://www.mdpi.com/2076-3417/13/13/7579speech disfluency detectionstutteringlimited datawav2vec2.0entropy invariance
spellingShingle	Jiajun Liu Aishan Wumaier Dongping Wei Shen Guo Automatic Speech Disfluency Detection Using wav2vec2.0 for Different Languages with Variable Lengths Applied Sciences speech disfluency detection stuttering limited data wav2vec2.0 entropy invariance
title	Automatic Speech Disfluency Detection Using wav2vec2.0 for Different Languages with Variable Lengths
title_full	Automatic Speech Disfluency Detection Using wav2vec2.0 for Different Languages with Variable Lengths
title_fullStr	Automatic Speech Disfluency Detection Using wav2vec2.0 for Different Languages with Variable Lengths
title_full_unstemmed	Automatic Speech Disfluency Detection Using wav2vec2.0 for Different Languages with Variable Lengths
title_short	Automatic Speech Disfluency Detection Using wav2vec2.0 for Different Languages with Variable Lengths
title_sort	automatic speech disfluency detection using wav2vec2 0 for different languages with variable lengths
topic	speech disfluency detection stuttering limited data wav2vec2.0 entropy invariance
url	https://www.mdpi.com/2076-3417/13/13/7579
work_keys_str_mv	AT jiajunliu automaticspeechdisfluencydetectionusingwav2vec20fordifferentlanguageswithvariablelengths AT aishanwumaier automaticspeechdisfluencydetectionusingwav2vec20fordifferentlanguageswithvariablelengths AT dongpingwei automaticspeechdisfluencydetectionusingwav2vec20fordifferentlanguageswithvariablelengths AT shenguo automaticspeechdisfluencydetectionusingwav2vec20fordifferentlanguageswithvariablelengths

Automatic Speech Disfluency Detection Using wav2vec2.0 for Different Languages with Variable Lengths

Similar Items