Joint Optimization of Deep Neural Network-Based Dereverberation and Beamforming for Sound Event Detection in Multi-Channel Environments

In this paper, we propose joint optimization of deep neural network (DNN)-supported dereverberation and beamforming for the convolutional recurrent neural network (CRNN)-based sound event detection (SED) in multi-channel environments. First, the short-time Fourier transform (STFT) coefficients are c...

Full description

Bibliographic Details
Main Authors:	Kyoungjin Noh, Joon-Hyuk Chang
Format:	Article
Language:	English
Published:	MDPI AG 2020-03-01
Series:	Sensors
Subjects:	sound event detection dereverberation acoustic beamforming convolutional recurrent neural network joint optimization
Online Access:	https://www.mdpi.com/1424-8220/20/7/1883

_version_	1827761340392931328
author	Kyoungjin Noh Joon-Hyuk Chang
author_facet	Kyoungjin Noh Joon-Hyuk Chang
author_sort	Kyoungjin Noh
collection	DOAJ
description	In this paper, we propose joint optimization of deep neural network (DNN)-supported dereverberation and beamforming for the convolutional recurrent neural network (CRNN)-based sound event detection (SED) in multi-channel environments. First, the short-time Fourier transform (STFT) coefficients are calculated from multi-channel audio signals under the noisy and reverberant environments, which are then enhanced by the DNN-supported weighted prediction error (WPE) dereverberation with the estimated masks. Next, the STFT coefficients of the dereverberated multi-channel audio signals are conveyed to the DNN-supported minimum variance distortionless response (MVDR) beamformer in which DNN-supported MVDR beamforming is carried out with the source and noise masks estimated by the DNN. As a result, the single-channel enhanced STFT coefficients are shown at the output and tossed to the CRNN-based SED system, and then, the three modules are jointly trained by the single loss function designed for SED. Furthermore, to ease the difficulty of training a deep learning model for SED caused by the imbalance in the amount of data for each class, the focal loss is used as a loss function. Experimental results show that joint training of DNN-supported dereverberation and beamforming with the SED model under the supervision of focal loss significantly improves the performance under the noisy and reverberant environments.
first_indexed	2024-03-11T10:09:34Z
format	Article
id	doaj.art-1c0a8a0413874b56b1261082ea00b604
institution	Directory Open Access Journal
issn	1424-8220
language	English
last_indexed	2024-03-11T10:09:34Z
publishDate	2020-03-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj.art-1c0a8a0413874b56b1261082ea00b6042023-11-16T14:33:56ZengMDPI AGSensors1424-82202020-03-01207188310.3390/s20071883Joint Optimization of Deep Neural Network-Based Dereverberation and Beamforming for Sound Event Detection in Multi-Channel EnvironmentsKyoungjin Noh0Joon-Hyuk Chang1Department of Electronics and Computer Engineering, Hanyang University, Seoul 04763, KoreaDepartment of Electronics and Computer Engineering, Hanyang University, Seoul 04763, KoreaIn this paper, we propose joint optimization of deep neural network (DNN)-supported dereverberation and beamforming for the convolutional recurrent neural network (CRNN)-based sound event detection (SED) in multi-channel environments. First, the short-time Fourier transform (STFT) coefficients are calculated from multi-channel audio signals under the noisy and reverberant environments, which are then enhanced by the DNN-supported weighted prediction error (WPE) dereverberation with the estimated masks. Next, the STFT coefficients of the dereverberated multi-channel audio signals are conveyed to the DNN-supported minimum variance distortionless response (MVDR) beamformer in which DNN-supported MVDR beamforming is carried out with the source and noise masks estimated by the DNN. As a result, the single-channel enhanced STFT coefficients are shown at the output and tossed to the CRNN-based SED system, and then, the three modules are jointly trained by the single loss function designed for SED. Furthermore, to ease the difficulty of training a deep learning model for SED caused by the imbalance in the amount of data for each class, the focal loss is used as a loss function. Experimental results show that joint training of DNN-supported dereverberation and beamforming with the SED model under the supervision of focal loss significantly improves the performance under the noisy and reverberant environments.https://www.mdpi.com/1424-8220/20/7/1883sound event detectiondereverberationacoustic beamformingconvolutional recurrent neural networkjoint optimization
spellingShingle	Kyoungjin Noh Joon-Hyuk Chang Joint Optimization of Deep Neural Network-Based Dereverberation and Beamforming for Sound Event Detection in Multi-Channel Environments Sensors sound event detection dereverberation acoustic beamforming convolutional recurrent neural network joint optimization
title	Joint Optimization of Deep Neural Network-Based Dereverberation and Beamforming for Sound Event Detection in Multi-Channel Environments
title_full	Joint Optimization of Deep Neural Network-Based Dereverberation and Beamforming for Sound Event Detection in Multi-Channel Environments
title_fullStr	Joint Optimization of Deep Neural Network-Based Dereverberation and Beamforming for Sound Event Detection in Multi-Channel Environments
title_full_unstemmed	Joint Optimization of Deep Neural Network-Based Dereverberation and Beamforming for Sound Event Detection in Multi-Channel Environments
title_short	Joint Optimization of Deep Neural Network-Based Dereverberation and Beamforming for Sound Event Detection in Multi-Channel Environments
title_sort	joint optimization of deep neural network based dereverberation and beamforming for sound event detection in multi channel environments
topic	sound event detection dereverberation acoustic beamforming convolutional recurrent neural network joint optimization
url	https://www.mdpi.com/1424-8220/20/7/1883
work_keys_str_mv	AT kyoungjinnoh jointoptimizationofdeepneuralnetworkbaseddereverberationandbeamformingforsoundeventdetectioninmultichannelenvironments AT joonhyukchang jointoptimizationofdeepneuralnetworkbaseddereverberationandbeamformingforsoundeventdetectioninmultichannelenvironments

Joint Optimization of Deep Neural Network-Based Dereverberation and Beamforming for Sound Event Detection in Multi-Channel Environments

Similar Items