A Mel-weighted Spectrogram Feature Extraction for Improved Speaker Recognition System

Speaker recognition system is intended to recognize a person’s identity. This task can be done by knowing the feature of the contained voice signals. The feature can be extracted using a feature extraction technique. One of the most popular feature extraction techniques is Mel-Frequency Cepstral Co...

Full description

Bibliographic Details
Main Authors: Astuti, Yenni, Hidayat, Risanuri, Bejo, Agus
Format: Article
Language:English
Published: Hindawi 2022
Subjects:
Online Access:https://repository.ugm.ac.id/278617/1/Astuti_TK.pdf
_version_ 1826050289947377664
author Astuti, Yenni
Hidayat, Risanuri
Bejo, Agus
author_facet Astuti, Yenni
Hidayat, Risanuri
Bejo, Agus
author_sort Astuti, Yenni
collection UGM
description Speaker recognition system is intended to recognize a person’s identity. This task can be done by knowing the feature of the contained voice signals. The feature can be extracted using a feature extraction technique. One of the most popular feature extraction techniques is Mel-Frequency Cepstral Coefficient (MFCC). Until this time, the MFCC is still a challenging technique to be developed. In this work, a proposed technique is developed based on the MFCC. The original MFCC is modified in the process of filter bank and by adding spectrogram. The target of this work is to obtain a better recognition accuracy than the original MFCC. This target can be done by applying a technique called Mel-weighted spectrogram. The output of the proposed technique are spectrogram images which contain the feature of the voices. The spectrogram result is then classified using dissimilarity space based on Euclidean distance to identify the person’s identity. For the dataset, this work uses 315 recorded voice signals, consisting of 3 speakers, each pronouncing five words repeatedly 21 times on seven different days. The performance of this system is evaluated by comparing the percentage of accuracy among the proposed technique, the original MFCC, and two other MFCC-based techniques. In this work, the proposed technique is better than the three other techniques with an accuracy up to 88.57%. From these results, the Mel-weighted spectrogram can be considered as a recommendation for obtaining a higher recognition rate in speaker recognition system.
first_indexed 2024-03-14T00:01:33Z
format Article
id oai:generic.eprints.org:278617
institution Universiti Gadjah Mada
language English
last_indexed 2024-03-14T00:01:33Z
publishDate 2022
publisher Hindawi
record_format dspace
spelling oai:generic.eprints.org:2786172023-11-02T01:57:58Z https://repository.ugm.ac.id/278617/ A Mel-weighted Spectrogram Feature Extraction for Improved Speaker Recognition System Astuti, Yenni Hidayat, Risanuri Bejo, Agus Electrical and Electronic Engineering Engineering Speaker recognition system is intended to recognize a person’s identity. This task can be done by knowing the feature of the contained voice signals. The feature can be extracted using a feature extraction technique. One of the most popular feature extraction techniques is Mel-Frequency Cepstral Coefficient (MFCC). Until this time, the MFCC is still a challenging technique to be developed. In this work, a proposed technique is developed based on the MFCC. The original MFCC is modified in the process of filter bank and by adding spectrogram. The target of this work is to obtain a better recognition accuracy than the original MFCC. This target can be done by applying a technique called Mel-weighted spectrogram. The output of the proposed technique are spectrogram images which contain the feature of the voices. The spectrogram result is then classified using dissimilarity space based on Euclidean distance to identify the person’s identity. For the dataset, this work uses 315 recorded voice signals, consisting of 3 speakers, each pronouncing five words repeatedly 21 times on seven different days. The performance of this system is evaluated by comparing the percentage of accuracy among the proposed technique, the original MFCC, and two other MFCC-based techniques. In this work, the proposed technique is better than the three other techniques with an accuracy up to 88.57%. From these results, the Mel-weighted spectrogram can be considered as a recommendation for obtaining a higher recognition rate in speaker recognition system. Hindawi 2022 Article PeerReviewed application/pdf en https://repository.ugm.ac.id/278617/1/Astuti_TK.pdf Astuti, Yenni and Hidayat, Risanuri and Bejo, Agus (2022) A Mel-weighted Spectrogram Feature Extraction for Improved Speaker Recognition System. International Journal of Intelligent Engineering and Systems, 15 (6). pp. 74-82. ISSN 2185-3118 https://www.hindawi.com/journals/ijis/ 10.22266/ijies2022.1231.08
spellingShingle Electrical and Electronic Engineering
Engineering
Astuti, Yenni
Hidayat, Risanuri
Bejo, Agus
A Mel-weighted Spectrogram Feature Extraction for Improved Speaker Recognition System
title A Mel-weighted Spectrogram Feature Extraction for Improved Speaker Recognition System
title_full A Mel-weighted Spectrogram Feature Extraction for Improved Speaker Recognition System
title_fullStr A Mel-weighted Spectrogram Feature Extraction for Improved Speaker Recognition System
title_full_unstemmed A Mel-weighted Spectrogram Feature Extraction for Improved Speaker Recognition System
title_short A Mel-weighted Spectrogram Feature Extraction for Improved Speaker Recognition System
title_sort mel weighted spectrogram feature extraction for improved speaker recognition system
topic Electrical and Electronic Engineering
Engineering
url https://repository.ugm.ac.id/278617/1/Astuti_TK.pdf
work_keys_str_mv AT astutiyenni amelweightedspectrogramfeatureextractionforimprovedspeakerrecognitionsystem
AT hidayatrisanuri amelweightedspectrogramfeatureextractionforimprovedspeakerrecognitionsystem
AT bejoagus amelweightedspectrogramfeatureextractionforimprovedspeakerrecognitionsystem
AT astutiyenni melweightedspectrogramfeatureextractionforimprovedspeakerrecognitionsystem
AT hidayatrisanuri melweightedspectrogramfeatureextractionforimprovedspeakerrecognitionsystem
AT bejoagus melweightedspectrogramfeatureextractionforimprovedspeakerrecognitionsystem