Speaker Identification Using a Convolutional Neural Network

Speech, a mode of communication between humans and machines, has various applications, including biometric systems for identifying people have access to secure systems. Feature extraction is an important factor in speech recognition with high accuracy. Therefore, we implemented a spectrogram, which...

Full description

Bibliographic Details
Main Authors:	Suci Dwijayanti, Alvio Yunita Putri, Bhakti Yudho Suprapto
Format:	Article
Language:	English
Published:	Ikatan Ahli Informatika Indonesia 2022-02-01
Series:	Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
Subjects:	speaker identification cnn spectrogram feature extraction
Online Access:	http://jurnal.iaii.or.id/index.php/RESTI/article/view/3795

_version_	1827364834975416320
author	Suci Dwijayanti Alvio Yunita Putri Bhakti Yudho Suprapto
author_facet	Suci Dwijayanti Alvio Yunita Putri Bhakti Yudho Suprapto
author_sort	Suci Dwijayanti
collection	DOAJ
description	Speech, a mode of communication between humans and machines, has various applications, including biometric systems for identifying people have access to secure systems. Feature extraction is an important factor in speech recognition with high accuracy. Therefore, we implemented a spectrogram, which is a pictorial representation of speech in terms of raw features, to identify speakers. These features were inputted into a convolutional neural network (CNN), and a CNN-visual geometry group (CNN-VGG) architecture was used to recognize the speakers. We used 780 primary data from 78 speakers, and each speaker uttered a number in Bahasa Indonesia. The proposed architecture, CNN-VGG-f, has a learning rate of 0.001, batch size of 256, and epoch of 100. The results indicate that this architecture can generate a suitable model for speaker identification. A spectrogram was used to determine the best features for identifying the speakers. The proposed method exhibited an accuracy of 98.78%, which is significantly higher than the accuracies of the method involving Mel-frequency cepstral coefficients (MFCCs; 34.62%) and the combination of MFCCs and deltas (26.92%). Overall, CNN-VGG-f with the spectrogram can identify 77 speakers from the samples, validating the usefulness of the combination of spectrograms and CNN in speech recognition applications.
first_indexed	2024-03-08T08:17:03Z
format	Article
id	doaj.art-1b381347f0c1447f8c9aad51f2f2a8f2
institution	Directory Open Access Journal
issn	2580-0760
language	English
last_indexed	2024-03-08T08:17:03Z
publishDate	2022-02-01
publisher	Ikatan Ahli Informatika Indonesia
record_format	Article
series	Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
spelling	doaj.art-1b381347f0c1447f8c9aad51f2f2a8f22024-02-02T06:58:52ZengIkatan Ahli Informatika IndonesiaJurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)2580-07602022-02-016114014510.29207/resti.v6i1.37953795Speaker Identification Using a Convolutional Neural NetworkSuci Dwijayanti0Alvio Yunita Putri1Bhakti Yudho Suprapto2Universitas SriwijayaTeknik Elektro Universitas SriwijayaTeknik Elektro Universitas SriwijayaSpeech, a mode of communication between humans and machines, has various applications, including biometric systems for identifying people have access to secure systems. Feature extraction is an important factor in speech recognition with high accuracy. Therefore, we implemented a spectrogram, which is a pictorial representation of speech in terms of raw features, to identify speakers. These features were inputted into a convolutional neural network (CNN), and a CNN-visual geometry group (CNN-VGG) architecture was used to recognize the speakers. We used 780 primary data from 78 speakers, and each speaker uttered a number in Bahasa Indonesia. The proposed architecture, CNN-VGG-f, has a learning rate of 0.001, batch size of 256, and epoch of 100. The results indicate that this architecture can generate a suitable model for speaker identification. A spectrogram was used to determine the best features for identifying the speakers. The proposed method exhibited an accuracy of 98.78%, which is significantly higher than the accuracies of the method involving Mel-frequency cepstral coefficients (MFCCs; 34.62%) and the combination of MFCCs and deltas (26.92%). Overall, CNN-VGG-f with the spectrogram can identify 77 speakers from the samples, validating the usefulness of the combination of spectrograms and CNN in speech recognition applications.http://jurnal.iaii.or.id/index.php/RESTI/article/view/3795speaker identificationcnnspectrogramfeature extraction
spellingShingle	Suci Dwijayanti Alvio Yunita Putri Bhakti Yudho Suprapto Speaker Identification Using a Convolutional Neural Network Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) speaker identification cnn spectrogram feature extraction
title	Speaker Identification Using a Convolutional Neural Network
title_full	Speaker Identification Using a Convolutional Neural Network
title_fullStr	Speaker Identification Using a Convolutional Neural Network
title_full_unstemmed	Speaker Identification Using a Convolutional Neural Network
title_short	Speaker Identification Using a Convolutional Neural Network
title_sort	speaker identification using a convolutional neural network
topic	speaker identification cnn spectrogram feature extraction
url	http://jurnal.iaii.or.id/index.php/RESTI/article/view/3795
work_keys_str_mv	AT sucidwijayanti speakeridentificationusingaconvolutionalneuralnetwork AT alvioyunitaputri speakeridentificationusingaconvolutionalneuralnetwork AT bhaktiyudhosuprapto speakeridentificationusingaconvolutionalneuralnetwork

Speaker Identification Using a Convolutional Neural Network

Similar Items