RS-MSConvNet: A Novel End-to-End Pathological Voice Detection Model
Recent studies have reported the success of multi-scale convolution neural network (MSConvNet) model for many classification applications due to its powerful ability of exploring multi-scale convolution block to extract multi-scale representations to make a detection. However, a new design based on...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2022-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9938443/ |
_version_ | 1811216457246179328 |
---|---|
author | Wongsathon Pathonsuwan Khomdet Phapatanaburi Prawit Buayai Talit Jumphoo Patikorn Anchuen Monthippa Uthansakul Peerapong Uthansakul |
author_facet | Wongsathon Pathonsuwan Khomdet Phapatanaburi Prawit Buayai Talit Jumphoo Patikorn Anchuen Monthippa Uthansakul Peerapong Uthansakul |
author_sort | Wongsathon Pathonsuwan |
collection | DOAJ |
description | Recent studies have reported the success of multi-scale convolution neural network (MSConvNet) model for many classification applications due to its powerful ability of exploring multi-scale convolution block to extract multi-scale representations to make a detection. However, a new design based on MSConvNet for pathological voice detection has not been explored. In this paper, we propose RS-MSConvNet, a novel end-to-end MSConvNet model using raw speech for pathological voice detection. The main contribution of the proposed RS-MSConvNet method is to exploit the multi-scale convolution block, followed by spatial-temporal feature block, and fully connected layer as classification. In addition, to further improve accuracy performance, we propose a novel hybrid detection model by integrating the feature extraction ability of the RS-MSConvNet model and the classifier of support vector machine (SVM) method, called RS-MSConvNet-SVM model. The effectiveness of our proposed models is investigated using the TORGO database. The experimental results reveal that the RS-MSConvNet model outperforms other baseline methods in the speaker-independent task. Moreover and as compared to the RS-MSConvNet-SVM model, a further improved accuracy is obtained using the RS-MSConvNet-SVM model. These outcomes exhibit that our proposed models are useful for pathological voice detection. |
first_indexed | 2024-04-12T06:39:23Z |
format | Article |
id | doaj.art-b27900e202b54dca95f526b1696d80f6 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-12T06:39:23Z |
publishDate | 2022-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-b27900e202b54dca95f526b1696d80f62022-12-22T03:43:47ZengIEEEIEEE Access2169-35362022-01-011012045012046110.1109/ACCESS.2022.32196069938443RS-MSConvNet: A Novel End-to-End Pathological Voice Detection ModelWongsathon Pathonsuwan0Khomdet Phapatanaburi1https://orcid.org/0000-0002-6487-2073Prawit Buayai2Talit Jumphoo3Patikorn Anchuen4Monthippa Uthansakul5https://orcid.org/0000-0002-9155-3561Peerapong Uthansakul6https://orcid.org/0000-0002-7108-9263School of Telecommunication Engineering, Suranaree University of Technology, Nakhon Ratchasima, ThailandDepartment of Telecommunication Engineering, Faculty of Engineering and Technology, Rajamangala University of Technology Isan (RMUTI), Nakhon Ratchasima, ThailandGraduate Faculty of Interdisciplinary Research, University of Yamanashi, Kofu, JapanSchool of Telecommunication Engineering, Suranaree University of Technology, Nakhon Ratchasima, ThailandNavaminda Kasatriyadhiraj Royal Air Force Academy, Bangkok, ThailandSchool of Telecommunication Engineering, Suranaree University of Technology, Nakhon Ratchasima, ThailandSchool of Telecommunication Engineering, Suranaree University of Technology, Nakhon Ratchasima, ThailandRecent studies have reported the success of multi-scale convolution neural network (MSConvNet) model for many classification applications due to its powerful ability of exploring multi-scale convolution block to extract multi-scale representations to make a detection. However, a new design based on MSConvNet for pathological voice detection has not been explored. In this paper, we propose RS-MSConvNet, a novel end-to-end MSConvNet model using raw speech for pathological voice detection. The main contribution of the proposed RS-MSConvNet method is to exploit the multi-scale convolution block, followed by spatial-temporal feature block, and fully connected layer as classification. In addition, to further improve accuracy performance, we propose a novel hybrid detection model by integrating the feature extraction ability of the RS-MSConvNet model and the classifier of support vector machine (SVM) method, called RS-MSConvNet-SVM model. The effectiveness of our proposed models is investigated using the TORGO database. The experimental results reveal that the RS-MSConvNet model outperforms other baseline methods in the speaker-independent task. Moreover and as compared to the RS-MSConvNet-SVM model, a further improved accuracy is obtained using the RS-MSConvNet-SVM model. These outcomes exhibit that our proposed models are useful for pathological voice detection.https://ieeexplore.ieee.org/document/9938443/Pathological voice detectionend-to-end architecturemulti-scale convolutionspatial-temporal featurehybrid model |
spellingShingle | Wongsathon Pathonsuwan Khomdet Phapatanaburi Prawit Buayai Talit Jumphoo Patikorn Anchuen Monthippa Uthansakul Peerapong Uthansakul RS-MSConvNet: A Novel End-to-End Pathological Voice Detection Model IEEE Access Pathological voice detection end-to-end architecture multi-scale convolution spatial-temporal feature hybrid model |
title | RS-MSConvNet: A Novel End-to-End Pathological Voice Detection Model |
title_full | RS-MSConvNet: A Novel End-to-End Pathological Voice Detection Model |
title_fullStr | RS-MSConvNet: A Novel End-to-End Pathological Voice Detection Model |
title_full_unstemmed | RS-MSConvNet: A Novel End-to-End Pathological Voice Detection Model |
title_short | RS-MSConvNet: A Novel End-to-End Pathological Voice Detection Model |
title_sort | rs msconvnet a novel end to end pathological voice detection model |
topic | Pathological voice detection end-to-end architecture multi-scale convolution spatial-temporal feature hybrid model |
url | https://ieeexplore.ieee.org/document/9938443/ |
work_keys_str_mv | AT wongsathonpathonsuwan rsmsconvnetanovelendtoendpathologicalvoicedetectionmodel AT khomdetphapatanaburi rsmsconvnetanovelendtoendpathologicalvoicedetectionmodel AT prawitbuayai rsmsconvnetanovelendtoendpathologicalvoicedetectionmodel AT talitjumphoo rsmsconvnetanovelendtoendpathologicalvoicedetectionmodel AT patikornanchuen rsmsconvnetanovelendtoendpathologicalvoicedetectionmodel AT monthippauthansakul rsmsconvnetanovelendtoendpathologicalvoicedetectionmodel AT peeraponguthansakul rsmsconvnetanovelendtoendpathologicalvoicedetectionmodel |