Summary: | Speaker recognition system is intended to recognize a person’s identity. This task can be done by knowing the feature of the contained voice signals. The feature can be extracted using a feature extraction technique. One of
the most popular feature extraction techniques is Mel-Frequency Cepstral Coefficient (MFCC). Until this time, the
MFCC is still a challenging technique to be developed. In this work, a proposed technique is developed based on the
MFCC. The original MFCC is modified in the process of filter bank and by adding spectrogram. The target of this work is to obtain a better recognition accuracy than the original MFCC. This target can be done by applying a technique called Mel-weighted spectrogram. The output of the proposed technique are spectrogram images which contain the feature of the voices. The spectrogram result is then classified using dissimilarity space based on Euclidean distance to identify the person’s identity. For the dataset, this work uses 315 recorded voice signals, consisting of 3 speakers, each pronouncing five words repeatedly 21 times on seven different days. The performance of this system is evaluated by comparing the percentage of accuracy among the proposed technique, the original MFCC, and two other MFCC-based techniques. In this work, the proposed technique is better than the three other techniques with an accuracy up to 88.57%. From these results, the Mel-weighted spectrogram can be considered as a recommendation for obtaining a higher recognition rate in speaker recognition system.
|