Development of a regional voice dataset and speaker classification based on machine learning
Abstract At present, voice biometrics are commonly used for identification and authentication of users through their voice. Voice based services such as mobile banking, access to personal devices, and logging into social networks are the common examples of authenticating users through voice biometri...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2021-03-01
|
Series: | Journal of Big Data |
Subjects: | |
Online Access: | https://doi.org/10.1186/s40537-021-00435-9 |
_version_ | 1823927497917464576 |
---|---|
author | Muhammad Ismail Shahzad Memon Lachhman Das Dhomeja Shahid Munir Shah Dostdar Hussain Sabit Rahim Imran Ali |
author_facet | Muhammad Ismail Shahzad Memon Lachhman Das Dhomeja Shahid Munir Shah Dostdar Hussain Sabit Rahim Imran Ali |
author_sort | Muhammad Ismail |
collection | DOAJ |
description | Abstract At present, voice biometrics are commonly used for identification and authentication of users through their voice. Voice based services such as mobile banking, access to personal devices, and logging into social networks are the common examples of authenticating users through voice biometrics. In Pakistan, voice-based services are very common in banking and mobile/cellular sector, however, these services do not use voice features to recognize customers. Therefore, the chance to use these services with false identity is always high. It is essential to design a voice-based recognition system to minimize the risk of false identity. In this paper, we developed regional voice datasets for voice biometrics, by collecting voice data in different local accents of Pakistan. Although, there is a global need for voice biometrics especially when voice-based services are common, however, this paper uses Pakistan as a use case to show how to build regional voice dataset for voice biometrics. To build voice dataset, voice samples were recorded from 180 male and female speakers with two languages English and Urdu in form of five regional accents. Mel Frequency Cepstral Coefficient (MFCC) features were extracted from the collected voice samples to train Support Vector Machine (SVM), Artificial Neural Network (ANN), Random Forest (RF) and K-nearest neighbor (KNN) classifiers. The results indicate that ANN outperformed SVM, RF and KNN by achieving 88.53% and 86.58% recognition accuracy on both datasets respectively. |
first_indexed | 2024-12-16T20:40:06Z |
format | Article |
id | doaj.art-2a2e67eeb826450db58161e5908467c3 |
institution | Directory Open Access Journal |
issn | 2196-1115 |
language | English |
last_indexed | 2024-12-16T20:40:06Z |
publishDate | 2021-03-01 |
publisher | SpringerOpen |
record_format | Article |
series | Journal of Big Data |
spelling | doaj.art-2a2e67eeb826450db58161e5908467c32022-12-21T22:17:07ZengSpringerOpenJournal of Big Data2196-11152021-03-018111810.1186/s40537-021-00435-9Development of a regional voice dataset and speaker classification based on machine learningMuhammad Ismail0Shahzad Memon1Lachhman Das Dhomeja2Shahid Munir Shah3Dostdar Hussain4Sabit Rahim5Imran Ali6AHS Bukhari Institute of Information and Communication Technology, Faculty of Engineering and Technology, University of SindhAHS Bukhari Institute of Information and Communication Technology, Faculty of Engineering and Technology, University of SindhAHS Bukhari Institute of Information and Communication Technology, Faculty of Engineering and Technology, University of SindhDepartment of Computer Science, Barrett Hodgson UniversityDepartment of Computer Science, Karakoram International UniversityDepartment of Computer Science, Karakoram International UniversityDepartment of Computer Science, Karakoram International UniversityAbstract At present, voice biometrics are commonly used for identification and authentication of users through their voice. Voice based services such as mobile banking, access to personal devices, and logging into social networks are the common examples of authenticating users through voice biometrics. In Pakistan, voice-based services are very common in banking and mobile/cellular sector, however, these services do not use voice features to recognize customers. Therefore, the chance to use these services with false identity is always high. It is essential to design a voice-based recognition system to minimize the risk of false identity. In this paper, we developed regional voice datasets for voice biometrics, by collecting voice data in different local accents of Pakistan. Although, there is a global need for voice biometrics especially when voice-based services are common, however, this paper uses Pakistan as a use case to show how to build regional voice dataset for voice biometrics. To build voice dataset, voice samples were recorded from 180 male and female speakers with two languages English and Urdu in form of five regional accents. Mel Frequency Cepstral Coefficient (MFCC) features were extracted from the collected voice samples to train Support Vector Machine (SVM), Artificial Neural Network (ANN), Random Forest (RF) and K-nearest neighbor (KNN) classifiers. The results indicate that ANN outperformed SVM, RF and KNN by achieving 88.53% and 86.58% recognition accuracy on both datasets respectively.https://doi.org/10.1186/s40537-021-00435-9Speaker recognition systemsSpeakers classificationVoice databaseAccents and dialects |
spellingShingle | Muhammad Ismail Shahzad Memon Lachhman Das Dhomeja Shahid Munir Shah Dostdar Hussain Sabit Rahim Imran Ali Development of a regional voice dataset and speaker classification based on machine learning Journal of Big Data Speaker recognition systems Speakers classification Voice database Accents and dialects |
title | Development of a regional voice dataset and speaker classification based on machine learning |
title_full | Development of a regional voice dataset and speaker classification based on machine learning |
title_fullStr | Development of a regional voice dataset and speaker classification based on machine learning |
title_full_unstemmed | Development of a regional voice dataset and speaker classification based on machine learning |
title_short | Development of a regional voice dataset and speaker classification based on machine learning |
title_sort | development of a regional voice dataset and speaker classification based on machine learning |
topic | Speaker recognition systems Speakers classification Voice database Accents and dialects |
url | https://doi.org/10.1186/s40537-021-00435-9 |
work_keys_str_mv | AT muhammadismail developmentofaregionalvoicedatasetandspeakerclassificationbasedonmachinelearning AT shahzadmemon developmentofaregionalvoicedatasetandspeakerclassificationbasedonmachinelearning AT lachhmandasdhomeja developmentofaregionalvoicedatasetandspeakerclassificationbasedonmachinelearning AT shahidmunirshah developmentofaregionalvoicedatasetandspeakerclassificationbasedonmachinelearning AT dostdarhussain developmentofaregionalvoicedatasetandspeakerclassificationbasedonmachinelearning AT sabitrahim developmentofaregionalvoicedatasetandspeakerclassificationbasedonmachinelearning AT imranali developmentofaregionalvoicedatasetandspeakerclassificationbasedonmachinelearning |