RS-MSConvNet: A Novel End-to-End Pathological Voice Detection Model

Recent studies have reported the success of multi-scale convolution neural network (MSConvNet) model for many classification applications due to its powerful ability of exploring multi-scale convolution block to extract multi-scale representations to make a detection. However, a new design based on...

Full description

Bibliographic Details
Main Authors: Wongsathon Pathonsuwan, Khomdet Phapatanaburi, Prawit Buayai, Talit Jumphoo, Patikorn Anchuen, Monthippa Uthansakul, Peerapong Uthansakul
Format: Article
Language:English
Published: IEEE 2022-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9938443/
_version_ 1811216457246179328
author Wongsathon Pathonsuwan
Khomdet Phapatanaburi
Prawit Buayai
Talit Jumphoo
Patikorn Anchuen
Monthippa Uthansakul
Peerapong Uthansakul
author_facet Wongsathon Pathonsuwan
Khomdet Phapatanaburi
Prawit Buayai
Talit Jumphoo
Patikorn Anchuen
Monthippa Uthansakul
Peerapong Uthansakul
author_sort Wongsathon Pathonsuwan
collection DOAJ
description Recent studies have reported the success of multi-scale convolution neural network (MSConvNet) model for many classification applications due to its powerful ability of exploring multi-scale convolution block to extract multi-scale representations to make a detection. However, a new design based on MSConvNet for pathological voice detection has not been explored. In this paper, we propose RS-MSConvNet, a novel end-to-end MSConvNet model using raw speech for pathological voice detection. The main contribution of the proposed RS-MSConvNet method is to exploit the multi-scale convolution block, followed by spatial-temporal feature block, and fully connected layer as classification. In addition, to further improve accuracy performance, we propose a novel hybrid detection model by integrating the feature extraction ability of the RS-MSConvNet model and the classifier of support vector machine (SVM) method, called RS-MSConvNet-SVM model. The effectiveness of our proposed models is investigated using the TORGO database. The experimental results reveal that the RS-MSConvNet model outperforms other baseline methods in the speaker-independent task. Moreover and as compared to the RS-MSConvNet-SVM model, a further improved accuracy is obtained using the RS-MSConvNet-SVM model. These outcomes exhibit that our proposed models are useful for pathological voice detection.
first_indexed 2024-04-12T06:39:23Z
format Article
id doaj.art-b27900e202b54dca95f526b1696d80f6
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-12T06:39:23Z
publishDate 2022-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-b27900e202b54dca95f526b1696d80f62022-12-22T03:43:47ZengIEEEIEEE Access2169-35362022-01-011012045012046110.1109/ACCESS.2022.32196069938443RS-MSConvNet: A Novel End-to-End Pathological Voice Detection ModelWongsathon Pathonsuwan0Khomdet Phapatanaburi1https://orcid.org/0000-0002-6487-2073Prawit Buayai2Talit Jumphoo3Patikorn Anchuen4Monthippa Uthansakul5https://orcid.org/0000-0002-9155-3561Peerapong Uthansakul6https://orcid.org/0000-0002-7108-9263School of Telecommunication Engineering, Suranaree University of Technology, Nakhon Ratchasima, ThailandDepartment of Telecommunication Engineering, Faculty of Engineering and Technology, Rajamangala University of Technology Isan (RMUTI), Nakhon Ratchasima, ThailandGraduate Faculty of Interdisciplinary Research, University of Yamanashi, Kofu, JapanSchool of Telecommunication Engineering, Suranaree University of Technology, Nakhon Ratchasima, ThailandNavaminda Kasatriyadhiraj Royal Air Force Academy, Bangkok, ThailandSchool of Telecommunication Engineering, Suranaree University of Technology, Nakhon Ratchasima, ThailandSchool of Telecommunication Engineering, Suranaree University of Technology, Nakhon Ratchasima, ThailandRecent studies have reported the success of multi-scale convolution neural network (MSConvNet) model for many classification applications due to its powerful ability of exploring multi-scale convolution block to extract multi-scale representations to make a detection. However, a new design based on MSConvNet for pathological voice detection has not been explored. In this paper, we propose RS-MSConvNet, a novel end-to-end MSConvNet model using raw speech for pathological voice detection. The main contribution of the proposed RS-MSConvNet method is to exploit the multi-scale convolution block, followed by spatial-temporal feature block, and fully connected layer as classification. In addition, to further improve accuracy performance, we propose a novel hybrid detection model by integrating the feature extraction ability of the RS-MSConvNet model and the classifier of support vector machine (SVM) method, called RS-MSConvNet-SVM model. The effectiveness of our proposed models is investigated using the TORGO database. The experimental results reveal that the RS-MSConvNet model outperforms other baseline methods in the speaker-independent task. Moreover and as compared to the RS-MSConvNet-SVM model, a further improved accuracy is obtained using the RS-MSConvNet-SVM model. These outcomes exhibit that our proposed models are useful for pathological voice detection.https://ieeexplore.ieee.org/document/9938443/Pathological voice detectionend-to-end architecturemulti-scale convolutionspatial-temporal featurehybrid model
spellingShingle Wongsathon Pathonsuwan
Khomdet Phapatanaburi
Prawit Buayai
Talit Jumphoo
Patikorn Anchuen
Monthippa Uthansakul
Peerapong Uthansakul
RS-MSConvNet: A Novel End-to-End Pathological Voice Detection Model
IEEE Access
Pathological voice detection
end-to-end architecture
multi-scale convolution
spatial-temporal feature
hybrid model
title RS-MSConvNet: A Novel End-to-End Pathological Voice Detection Model
title_full RS-MSConvNet: A Novel End-to-End Pathological Voice Detection Model
title_fullStr RS-MSConvNet: A Novel End-to-End Pathological Voice Detection Model
title_full_unstemmed RS-MSConvNet: A Novel End-to-End Pathological Voice Detection Model
title_short RS-MSConvNet: A Novel End-to-End Pathological Voice Detection Model
title_sort rs msconvnet a novel end to end pathological voice detection model
topic Pathological voice detection
end-to-end architecture
multi-scale convolution
spatial-temporal feature
hybrid model
url https://ieeexplore.ieee.org/document/9938443/
work_keys_str_mv AT wongsathonpathonsuwan rsmsconvnetanovelendtoendpathologicalvoicedetectionmodel
AT khomdetphapatanaburi rsmsconvnetanovelendtoendpathologicalvoicedetectionmodel
AT prawitbuayai rsmsconvnetanovelendtoendpathologicalvoicedetectionmodel
AT talitjumphoo rsmsconvnetanovelendtoendpathologicalvoicedetectionmodel
AT patikornanchuen rsmsconvnetanovelendtoendpathologicalvoicedetectionmodel
AT monthippauthansakul rsmsconvnetanovelendtoendpathologicalvoicedetectionmodel
AT peeraponguthansakul rsmsconvnetanovelendtoendpathologicalvoicedetectionmodel