Ethnic recognition system for Malay language speakers using gammatone frequency cepstral coefficients pitch (GFCCP) and pattern classification

Malaysia is a multi-racial country consisting of many ethnic groups such as the Malay, Chinese, Indian, and Bumiputera, also known as a multilingual society. The Malay language is a non-tonal language, which does not need lexical stress. The study on recognizing the speaker's ethnicity is impor...

Full description

Bibliographic Details
Main Author: Mohd Hanifa, Rafizah
Format: Thesis
Language:English
English
English
Published: 2022
Subjects:
Online Access:http://eprints.uthm.edu.my/10809/1/24p%20RAFIZAH%20MOHD%20HANIFA.pdf
http://eprints.uthm.edu.my/10809/2/RAFIZAH%20MOHD%20HANIFA%20COPYRIGHT%20DECLARATION.pdf
http://eprints.uthm.edu.my/10809/3/RAFIZAH%20MOHD%20HANIFA%20WATERMARK.pdf
_version_ 1811134672209444864
author Mohd Hanifa, Rafizah
author_facet Mohd Hanifa, Rafizah
author_sort Mohd Hanifa, Rafizah
collection UTHM
description Malaysia is a multi-racial country consisting of many ethnic groups such as the Malay, Chinese, Indian, and Bumiputera, also known as a multilingual society. The Malay language is a non-tonal language, which does not need lexical stress. The study on recognizing the speaker's ethnicity is important as it has many potential and useful applications such as improving the interaction between robots and humans, audio forensic, telephone banking, and electronic commerce. Feature extraction, voice text-independent, and variability coverage are issues related to speaker recognition systems. The research focused on establishing a novel method, Gammatone Frequency Cepstral Coefficients and pitch (GFFCP) coupled with the K-Nearest Neighbours (KNN) and the voice text-independent system were used to identify the speaker's ethnicity. The speech corpus consisted of a collection of readings of Malay texts by both genders with ages ranging from 10 to 48 years old and classified into three ethnic groups: Malay, Chinese, and Indian. GFCC and Mel Frequency Cepstral Coefficients (MFCC) were used to represent the human auditory system. Pitch was added to MFCC and GFCC, as it contributes to the differences in the human voice and is difficult to imitate. The use of Naïve Bayes, Support Vector Machine (SVM), and KNN as classifiers was to quantify the pattern classification performance. The dataset used the hold-out validation methods (80% training, 20% testing) to split the data for training and testing. The system's performance was assessed based on the validation and prediction accuracy. The results revealed that the GFCCP obtained the highest validation and prediction accuracy from the KNN classifier. The validation accuracy was 100%, 99.6%, and 99.2% for 12, 24, and 34 speakers, respectively, while the prediction accuracy was 89.98%, 73.56%, and 72.36% for 12, 24, and 34 speakers, respectively. An important finding in the study is that the combination of the pitch with MFCC and GFCC provided better accuracy, with the latter performing better than the former, compared with those of MFCC and GFCC alone under noisy conditions.
first_indexed 2024-09-24T00:08:37Z
format Thesis
id uthm.eprints-10809
institution Universiti Tun Hussein Onn Malaysia
language English
English
English
last_indexed 2024-09-24T00:08:37Z
publishDate 2022
record_format dspace
spelling uthm.eprints-108092024-05-13T06:56:32Z http://eprints.uthm.edu.my/10809/ Ethnic recognition system for Malay language speakers using gammatone frequency cepstral coefficients pitch (GFCCP) and pattern classification Mohd Hanifa, Rafizah T Technology (General) Malaysia is a multi-racial country consisting of many ethnic groups such as the Malay, Chinese, Indian, and Bumiputera, also known as a multilingual society. The Malay language is a non-tonal language, which does not need lexical stress. The study on recognizing the speaker's ethnicity is important as it has many potential and useful applications such as improving the interaction between robots and humans, audio forensic, telephone banking, and electronic commerce. Feature extraction, voice text-independent, and variability coverage are issues related to speaker recognition systems. The research focused on establishing a novel method, Gammatone Frequency Cepstral Coefficients and pitch (GFFCP) coupled with the K-Nearest Neighbours (KNN) and the voice text-independent system were used to identify the speaker's ethnicity. The speech corpus consisted of a collection of readings of Malay texts by both genders with ages ranging from 10 to 48 years old and classified into three ethnic groups: Malay, Chinese, and Indian. GFCC and Mel Frequency Cepstral Coefficients (MFCC) were used to represent the human auditory system. Pitch was added to MFCC and GFCC, as it contributes to the differences in the human voice and is difficult to imitate. The use of Naïve Bayes, Support Vector Machine (SVM), and KNN as classifiers was to quantify the pattern classification performance. The dataset used the hold-out validation methods (80% training, 20% testing) to split the data for training and testing. The system's performance was assessed based on the validation and prediction accuracy. The results revealed that the GFCCP obtained the highest validation and prediction accuracy from the KNN classifier. The validation accuracy was 100%, 99.6%, and 99.2% for 12, 24, and 34 speakers, respectively, while the prediction accuracy was 89.98%, 73.56%, and 72.36% for 12, 24, and 34 speakers, respectively. An important finding in the study is that the combination of the pitch with MFCC and GFCC provided better accuracy, with the latter performing better than the former, compared with those of MFCC and GFCC alone under noisy conditions. 2022-03 Thesis NonPeerReviewed text en http://eprints.uthm.edu.my/10809/1/24p%20RAFIZAH%20MOHD%20HANIFA.pdf text en http://eprints.uthm.edu.my/10809/2/RAFIZAH%20MOHD%20HANIFA%20COPYRIGHT%20DECLARATION.pdf text en http://eprints.uthm.edu.my/10809/3/RAFIZAH%20MOHD%20HANIFA%20WATERMARK.pdf Mohd Hanifa, Rafizah (2022) Ethnic recognition system for Malay language speakers using gammatone frequency cepstral coefficients pitch (GFCCP) and pattern classification. Doctoral thesis, Universiti Tun Hussein Onn Malaysia.
spellingShingle T Technology (General)
Mohd Hanifa, Rafizah
Ethnic recognition system for Malay language speakers using gammatone frequency cepstral coefficients pitch (GFCCP) and pattern classification
title Ethnic recognition system for Malay language speakers using gammatone frequency cepstral coefficients pitch (GFCCP) and pattern classification
title_full Ethnic recognition system for Malay language speakers using gammatone frequency cepstral coefficients pitch (GFCCP) and pattern classification
title_fullStr Ethnic recognition system for Malay language speakers using gammatone frequency cepstral coefficients pitch (GFCCP) and pattern classification
title_full_unstemmed Ethnic recognition system for Malay language speakers using gammatone frequency cepstral coefficients pitch (GFCCP) and pattern classification
title_short Ethnic recognition system for Malay language speakers using gammatone frequency cepstral coefficients pitch (GFCCP) and pattern classification
title_sort ethnic recognition system for malay language speakers using gammatone frequency cepstral coefficients pitch gfccp and pattern classification
topic T Technology (General)
url http://eprints.uthm.edu.my/10809/1/24p%20RAFIZAH%20MOHD%20HANIFA.pdf
http://eprints.uthm.edu.my/10809/2/RAFIZAH%20MOHD%20HANIFA%20COPYRIGHT%20DECLARATION.pdf
http://eprints.uthm.edu.my/10809/3/RAFIZAH%20MOHD%20HANIFA%20WATERMARK.pdf
work_keys_str_mv AT mohdhanifarafizah ethnicrecognitionsystemformalaylanguagespeakersusinggammatonefrequencycepstralcoefficientspitchgfccpandpatternclassification