Mel-Weighted Single Frequency Filtering Spectrogram for Dialect Identification

In this study, we propose Mel-weighted single frequency filtering (SFF) spectrograms for dialect identification. The spectrum derived using SFF has high spectral resolution for harmonics and resonances while simultaneously maintaining good time-resolution of some speech excitation features such as i...

Full description

Bibliographic Details
Main Authors:	Rashmi Kethireddy, Sudarsana Reddy Kadiri, Paavo Alku, Suryakanth V. Gangashetty
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	Dialect identification single frequency filtering (SFF) spectrum Mel-spectrogram Mel-filter bank energies autoencoder
Online Access:	https://ieeexplore.ieee.org/document/9180347/

_version_	1811210616046616576
author	Rashmi Kethireddy Sudarsana Reddy Kadiri Paavo Alku Suryakanth V. Gangashetty
author_facet	Rashmi Kethireddy Sudarsana Reddy Kadiri Paavo Alku Suryakanth V. Gangashetty
author_sort	Rashmi Kethireddy
collection	DOAJ
description	In this study, we propose Mel-weighted single frequency filtering (SFF) spectrograms for dialect identification. The spectrum derived using SFF has high spectral resolution for harmonics and resonances while simultaneously maintaining good time-resolution of some speech excitation features such as impulse-like events. The SFF spectrum can represent speech characteristics such as burst time and glottal closure instances better than the short-time Fourier transform (STFT) spectrum. Our hypothesis is that these intricate representations in the SFF spectrum should help in distinguishing dialects. Therefore, we built a dialect identification system which uses an unsupervised, bottleneck feature representation of the Mel-weighted SFF spectrogram (Mel-SFF spectrogram) with sequence-to-sequence deep autoencoders. The language invariance of the proposed system was evaluated using two datasets: the UT-Podcast database (English) and the STYRIALECT database (German). The proposed representations gave a relative improvement of 9.47% and 4.69% in unweighted average recall (UAR) compared to the best baseline method on the development and test datasets, respectively, of the UT-Podcast database. The proposed representations also gave a comparable performance to the best baseline method for the STYRIALECT database. In addition, the fusion of the autoencoder bottleneck features computed from the Mel-SFF and Mel-STFT spectrograms improved the overall performance indicating complementary information between these features. By further analyzing the performance of the proposed representation with different utterance lengths using the UT-Podcast database, we observed that the proposed representation performed better on short utterances. The improved performance given by the Mel-weighted SFF spectrogram for recognizing dialects in both databases supports our hypothesis.
first_indexed	2024-04-12T04:59:00Z
format	Article
id	doaj.art-2dc2e1826381423daecf4e58784fb8bf
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-04-12T04:59:00Z
publishDate	2020-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-2dc2e1826381423daecf4e58784fb8bf2022-12-22T03:47:03ZengIEEEIEEE Access2169-35362020-01-01817487117487910.1109/ACCESS.2020.30205069180347Mel-Weighted Single Frequency Filtering Spectrogram for Dialect IdentificationRashmi Kethireddy0https://orcid.org/0000-0002-3047-8158Sudarsana Reddy Kadiri1https://orcid.org/0000-0001-5806-3053Paavo Alku2https://orcid.org/0000-0002-8173-9418Suryakanth V. Gangashetty3https://orcid.org/0000-0001-6745-4363Speech Processing Laboratory, International Institute of Information Technology-Hyderabad (IIIT-H), Hyderabad, IndiaDepartment of Signal Processing and Acoustics, Aalto University, Espoo, FinlandDepartment of Signal Processing and Acoustics, Aalto University, Espoo, FinlandSpeech Processing Laboratory, International Institute of Information Technology-Hyderabad (IIIT-H), Hyderabad, IndiaIn this study, we propose Mel-weighted single frequency filtering (SFF) spectrograms for dialect identification. The spectrum derived using SFF has high spectral resolution for harmonics and resonances while simultaneously maintaining good time-resolution of some speech excitation features such as impulse-like events. The SFF spectrum can represent speech characteristics such as burst time and glottal closure instances better than the short-time Fourier transform (STFT) spectrum. Our hypothesis is that these intricate representations in the SFF spectrum should help in distinguishing dialects. Therefore, we built a dialect identification system which uses an unsupervised, bottleneck feature representation of the Mel-weighted SFF spectrogram (Mel-SFF spectrogram) with sequence-to-sequence deep autoencoders. The language invariance of the proposed system was evaluated using two datasets: the UT-Podcast database (English) and the STYRIALECT database (German). The proposed representations gave a relative improvement of 9.47% and 4.69% in unweighted average recall (UAR) compared to the best baseline method on the development and test datasets, respectively, of the UT-Podcast database. The proposed representations also gave a comparable performance to the best baseline method for the STYRIALECT database. In addition, the fusion of the autoencoder bottleneck features computed from the Mel-SFF and Mel-STFT spectrograms improved the overall performance indicating complementary information between these features. By further analyzing the performance of the proposed representation with different utterance lengths using the UT-Podcast database, we observed that the proposed representation performed better on short utterances. The improved performance given by the Mel-weighted SFF spectrogram for recognizing dialects in both databases supports our hypothesis.https://ieeexplore.ieee.org/document/9180347/Dialect identificationsingle frequency filtering (SFF) spectrumMel-spectrogramMel-filter bank energiesautoencoder
spellingShingle	Rashmi Kethireddy Sudarsana Reddy Kadiri Paavo Alku Suryakanth V. Gangashetty Mel-Weighted Single Frequency Filtering Spectrogram for Dialect Identification IEEE Access Dialect identification single frequency filtering (SFF) spectrum Mel-spectrogram Mel-filter bank energies autoencoder
title	Mel-Weighted Single Frequency Filtering Spectrogram for Dialect Identification
title_full	Mel-Weighted Single Frequency Filtering Spectrogram for Dialect Identification
title_fullStr	Mel-Weighted Single Frequency Filtering Spectrogram for Dialect Identification
title_full_unstemmed	Mel-Weighted Single Frequency Filtering Spectrogram for Dialect Identification
title_short	Mel-Weighted Single Frequency Filtering Spectrogram for Dialect Identification
title_sort	mel weighted single frequency filtering spectrogram for dialect identification
topic	Dialect identification single frequency filtering (SFF) spectrum Mel-spectrogram Mel-filter bank energies autoencoder
url	https://ieeexplore.ieee.org/document/9180347/
work_keys_str_mv	AT rashmikethireddy melweightedsinglefrequencyfilteringspectrogramfordialectidentification AT sudarsanareddykadiri melweightedsinglefrequencyfilteringspectrogramfordialectidentification AT paavoalku melweightedsinglefrequencyfilteringspectrogramfordialectidentification AT suryakanthvgangashetty melweightedsinglefrequencyfilteringspectrogramfordialectidentification

Mel-Weighted Single Frequency Filtering Spectrogram for Dialect Identification

Similar Items