Research on Semi-Supervised Sound Event Detection Based on Mean Teacher Models Using ML-LoBCoD-NET

One of the most commonly method for sound event detection is the traditional convolutional neural network (CNN) or convolutional recurrent neural network (CRNN) and their variants. However, the pooling operation of the CNN has the disadvantage of losing the location information of the target object....

Full description

Bibliographic Details
Main Authors:	Jinjia Wang, Jing Xia, Qian Yang, Yuzhen Zhang
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	Sound event detection weakly-supervised learning semi-supervised learning mean teacher model multi-layer local block coordinate descent convolutional recurrent neural network
Online Access:	https://ieeexplore.ieee.org/document/9000951/

_version_	1818616914638798848
author	Jinjia Wang Jing Xia Qian Yang Yuzhen Zhang
author_facet	Jinjia Wang Jing Xia Qian Yang Yuzhen Zhang
author_sort	Jinjia Wang
collection	DOAJ
description	One of the most commonly method for sound event detection is the traditional convolutional neural network (CNN) or convolutional recurrent neural network (CRNN) and their variants. However, the pooling operation of the CNN has the disadvantage of losing the location information of the target object. We don't use the pooling operation, retaining ReLU and convolution operation, and we use the dictionary strong constraints and penalty function prior constraints of the multi-layer convolutional sparse coding (ML-CSC). We proposed iterative deep neural networks, the unfolded multi-layer local block coordinate descent networks (ML-LoBCoD-NET), driven by the multi-layer local block coordinate descent algorithm (ML-LoBCoD) which is extended from the local block coordinate descent (LoBCoD) algorithm. The ML-LoBCoD-NET can extract features different from the CNN. More importantly, for weakly-supervised sound event detection task, we proposed the MRNN-Att network which combines the ML-LoBCoD-NET, a recurrent neural network (RNN), and an attention network. The MCRNN-Att network combines MRNN-Att and CRNN network for fusing the different features. Furthermore, for semi-supervised sound event detection task, the MRNN-Att mean teacher model (MRNN-Att-MT) and the MCRNN-Att mean teacher model (MCRNN-Att-MT) are proposed, in which the MRNN-Att and the MCRNN-Att network are selected as the student model. These models were tested on the dataset of Detection and Classification of Acoustic Scenes and Events (DCASE) 2018 Task 4. The F1 score of the MRNN-Att-MT on the development set was 22.83%, which was 8.77% higher than the baseline system. The score of the MRNN-Att-MT on the evaluation set was 15.68%, which was 4.88% higher than the baseline system. The MCRNN-Att-MT model had an F1 score of 20.35% on the development set, which was 6.29% higher than the baseline system and the F1 score of 14.56% on the evaluation set, which was 3.76% higher than the baseline system.
first_indexed	2024-12-16T16:57:22Z
format	Article
id	doaj.art-ea46451d61154bd7a2436ec7180e6e82
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-16T16:57:22Z
publishDate	2020-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-ea46451d61154bd7a2436ec7180e6e822022-12-21T22:23:51ZengIEEEIEEE Access2169-35362020-01-018380323804410.1109/ACCESS.2020.29744799000951Research on Semi-Supervised Sound Event Detection Based on Mean Teacher Models Using ML-LoBCoD-NETJinjia Wang0https://orcid.org/0000-0002-2210-5570Jing Xia1https://orcid.org/0000-0001-5245-562XQian Yang2https://orcid.org/0000-0002-6552-7482Yuzhen Zhang3https://orcid.org/0000-0001-7655-4470School of Information Science and Engineering, Yanshan University, Qinhuangdao, ChinaSchool of Information Science and Engineering, Yanshan University, Qinhuangdao, ChinaSchool of Information Science and Engineering, Yanshan University, Qinhuangdao, ChinaSchool of Information Science and Engineering, Yanshan University, Qinhuangdao, ChinaOne of the most commonly method for sound event detection is the traditional convolutional neural network (CNN) or convolutional recurrent neural network (CRNN) and their variants. However, the pooling operation of the CNN has the disadvantage of losing the location information of the target object. We don't use the pooling operation, retaining ReLU and convolution operation, and we use the dictionary strong constraints and penalty function prior constraints of the multi-layer convolutional sparse coding (ML-CSC). We proposed iterative deep neural networks, the unfolded multi-layer local block coordinate descent networks (ML-LoBCoD-NET), driven by the multi-layer local block coordinate descent algorithm (ML-LoBCoD) which is extended from the local block coordinate descent (LoBCoD) algorithm. The ML-LoBCoD-NET can extract features different from the CNN. More importantly, for weakly-supervised sound event detection task, we proposed the MRNN-Att network which combines the ML-LoBCoD-NET, a recurrent neural network (RNN), and an attention network. The MCRNN-Att network combines MRNN-Att and CRNN network for fusing the different features. Furthermore, for semi-supervised sound event detection task, the MRNN-Att mean teacher model (MRNN-Att-MT) and the MCRNN-Att mean teacher model (MCRNN-Att-MT) are proposed, in which the MRNN-Att and the MCRNN-Att network are selected as the student model. These models were tested on the dataset of Detection and Classification of Acoustic Scenes and Events (DCASE) 2018 Task 4. The F1 score of the MRNN-Att-MT on the development set was 22.83%, which was 8.77% higher than the baseline system. The score of the MRNN-Att-MT on the evaluation set was 15.68%, which was 4.88% higher than the baseline system. The MCRNN-Att-MT model had an F1 score of 20.35% on the development set, which was 6.29% higher than the baseline system and the F1 score of 14.56% on the evaluation set, which was 3.76% higher than the baseline system.https://ieeexplore.ieee.org/document/9000951/Sound event detectionweakly-supervised learningsemi-supervised learningmean teacher modelmulti-layer local block coordinate descentconvolutional recurrent neural network
spellingShingle	Jinjia Wang Jing Xia Qian Yang Yuzhen Zhang Research on Semi-Supervised Sound Event Detection Based on Mean Teacher Models Using ML-LoBCoD-NET IEEE Access Sound event detection weakly-supervised learning semi-supervised learning mean teacher model multi-layer local block coordinate descent convolutional recurrent neural network
title	Research on Semi-Supervised Sound Event Detection Based on Mean Teacher Models Using ML-LoBCoD-NET
title_full	Research on Semi-Supervised Sound Event Detection Based on Mean Teacher Models Using ML-LoBCoD-NET
title_fullStr	Research on Semi-Supervised Sound Event Detection Based on Mean Teacher Models Using ML-LoBCoD-NET
title_full_unstemmed	Research on Semi-Supervised Sound Event Detection Based on Mean Teacher Models Using ML-LoBCoD-NET
title_short	Research on Semi-Supervised Sound Event Detection Based on Mean Teacher Models Using ML-LoBCoD-NET
title_sort	research on semi supervised sound event detection based on mean teacher models using ml lobcod net
topic	Sound event detection weakly-supervised learning semi-supervised learning mean teacher model multi-layer local block coordinate descent convolutional recurrent neural network
url	https://ieeexplore.ieee.org/document/9000951/
work_keys_str_mv	AT jinjiawang researchonsemisupervisedsoundeventdetectionbasedonmeanteachermodelsusingmllobcodnet AT jingxia researchonsemisupervisedsoundeventdetectionbasedonmeanteachermodelsusingmllobcodnet AT qianyang researchonsemisupervisedsoundeventdetectionbasedonmeanteachermodelsusingmllobcodnet AT yuzhenzhang researchonsemisupervisedsoundeventdetectionbasedonmeanteachermodelsusingmllobcodnet

Research on Semi-Supervised Sound Event Detection Based on Mean Teacher Models Using ML-LoBCoD-NET

Similar Items