3D-DCDAE: Unsupervised Music Latent Representations Learning Method Based on a Deep 3D Convolutional Denoising Autoencoder for Music Genre Classification

With unlabeled music data widely available, it is necessary to build an unsupervised latent music representation extractor to improve the performance of classification models. This paper proposes an unsupervised latent music representation learning method based on a deep 3D convolutional denoising a...

Full description

Bibliographic Details
Main Authors:	Lvyang Qiu, Shuyu Li, Yunsick Sung
Format:	Article
Language:	English
Published:	MDPI AG 2021-09-01
Series:	Mathematics
Subjects:	music genre classification MIDI autoencoder model 3D CNN unsupervised learning
Online Access:	https://www.mdpi.com/2227-7390/9/18/2274

_version_	1797518341913444352
author	Lvyang Qiu Shuyu Li Yunsick Sung
author_facet	Lvyang Qiu Shuyu Li Yunsick Sung
author_sort	Lvyang Qiu
collection	DOAJ
description	With unlabeled music data widely available, it is necessary to build an unsupervised latent music representation extractor to improve the performance of classification models. This paper proposes an unsupervised latent music representation learning method based on a deep 3D convolutional denoising autoencoder (3D-DCDAE) for music genre classification, which aims to learn common representations from a large amount of unlabeled data to improve the performance of music genre classification. Specifically, unlabeled MIDI files are applied to 3D-DCDAE to extract latent representations by denoising and reconstructing input data. Next, a decoder is utilized to assist the 3D-DCDAE in training. After 3D-DCDAE training, the decoder is replaced by a multilayer perceptron (MLP) classifier for music genre classification. Through the unsupervised latent representations learning method, unlabeled data can be applied to classification tasks so that the problem of limiting classification performance due to insufficient labeled data can be solved. In addition, the unsupervised 3D-DCDAE can consider the musicological structure to expand the understanding of the music field and improve performance in music genre classification. In the experiments, which utilized the Lakh MIDI dataset, a large amount of unlabeled data was utilized to train the 3D-DCDAE, obtaining a denoising and reconstruction accuracy of approximately 98%. A small amount of labeled data was utilized for training a classification model consisting of the trained 3D-DCDAE and the MLP classifier, which achieved a classification accuracy of approximately 88%. The experimental results show that the model achieves state-of-the-art performance and significantly outperforms other methods for music genre classification with only a small amount of labeled data.
first_indexed	2024-03-10T07:28:32Z
format	Article
id	doaj.art-f0061940629d4fd4b247d0fdc148b3d2
institution	Directory Open Access Journal
issn	2227-7390
language	English
last_indexed	2024-03-10T07:28:32Z
publishDate	2021-09-01
publisher	MDPI AG
record_format	Article
series	Mathematics
spelling	doaj.art-f0061940629d4fd4b247d0fdc148b3d22023-11-22T14:05:56ZengMDPI AGMathematics2227-73902021-09-01918227410.3390/math91822743D-DCDAE: Unsupervised Music Latent Representations Learning Method Based on a Deep 3D Convolutional Denoising Autoencoder for Music Genre ClassificationLvyang Qiu0Shuyu Li1Yunsick Sung2Department of Multimedia Engineering, Dongguk University-Seoul, Seoul 04620, KoreaDepartment of Multimedia Engineering, Dongguk University-Seoul, Seoul 04620, KoreaDepartment of Multimedia Engineering, Dongguk University-Seoul, Seoul 04620, KoreaWith unlabeled music data widely available, it is necessary to build an unsupervised latent music representation extractor to improve the performance of classification models. This paper proposes an unsupervised latent music representation learning method based on a deep 3D convolutional denoising autoencoder (3D-DCDAE) for music genre classification, which aims to learn common representations from a large amount of unlabeled data to improve the performance of music genre classification. Specifically, unlabeled MIDI files are applied to 3D-DCDAE to extract latent representations by denoising and reconstructing input data. Next, a decoder is utilized to assist the 3D-DCDAE in training. After 3D-DCDAE training, the decoder is replaced by a multilayer perceptron (MLP) classifier for music genre classification. Through the unsupervised latent representations learning method, unlabeled data can be applied to classification tasks so that the problem of limiting classification performance due to insufficient labeled data can be solved. In addition, the unsupervised 3D-DCDAE can consider the musicological structure to expand the understanding of the music field and improve performance in music genre classification. In the experiments, which utilized the Lakh MIDI dataset, a large amount of unlabeled data was utilized to train the 3D-DCDAE, obtaining a denoising and reconstruction accuracy of approximately 98%. A small amount of labeled data was utilized for training a classification model consisting of the trained 3D-DCDAE and the MLP classifier, which achieved a classification accuracy of approximately 88%. The experimental results show that the model achieves state-of-the-art performance and significantly outperforms other methods for music genre classification with only a small amount of labeled data.https://www.mdpi.com/2227-7390/9/18/2274music genre classificationMIDIautoencoder model3D CNNunsupervised learning
spellingShingle	Lvyang Qiu Shuyu Li Yunsick Sung 3D-DCDAE: Unsupervised Music Latent Representations Learning Method Based on a Deep 3D Convolutional Denoising Autoencoder for Music Genre Classification Mathematics music genre classification MIDI autoencoder model 3D CNN unsupervised learning
title	3D-DCDAE: Unsupervised Music Latent Representations Learning Method Based on a Deep 3D Convolutional Denoising Autoencoder for Music Genre Classification
title_full	3D-DCDAE: Unsupervised Music Latent Representations Learning Method Based on a Deep 3D Convolutional Denoising Autoencoder for Music Genre Classification
title_fullStr	3D-DCDAE: Unsupervised Music Latent Representations Learning Method Based on a Deep 3D Convolutional Denoising Autoencoder for Music Genre Classification
title_full_unstemmed	3D-DCDAE: Unsupervised Music Latent Representations Learning Method Based on a Deep 3D Convolutional Denoising Autoencoder for Music Genre Classification
title_short	3D-DCDAE: Unsupervised Music Latent Representations Learning Method Based on a Deep 3D Convolutional Denoising Autoencoder for Music Genre Classification
title_sort	3d dcdae unsupervised music latent representations learning method based on a deep 3d convolutional denoising autoencoder for music genre classification
topic	music genre classification MIDI autoencoder model 3D CNN unsupervised learning
url	https://www.mdpi.com/2227-7390/9/18/2274
work_keys_str_mv	AT lvyangqiu 3ddcdaeunsupervisedmusiclatentrepresentationslearningmethodbasedonadeep3dconvolutionaldenoisingautoencoderformusicgenreclassification AT shuyuli 3ddcdaeunsupervisedmusiclatentrepresentationslearningmethodbasedonadeep3dconvolutionaldenoisingautoencoderformusicgenreclassification AT yunsicksung 3ddcdaeunsupervisedmusiclatentrepresentationslearningmethodbasedonadeep3dconvolutionaldenoisingautoencoderformusicgenreclassification

3D-DCDAE: Unsupervised Music Latent Representations Learning Method Based on a Deep 3D Convolutional Denoising Autoencoder for Music Genre Classification

Similar Items