Dual-branch attention module-based network with parameter sharing for joint sound event detection and localization

Abstract The goal of sound event detection and localization (SELD) is to identify each individual sound event class and its activity time from a piece of audio, while estimating its spatial location at the time of activity. Conformer combines the advantages of convolutional layers and Transformer, w...

Full description

Bibliographic Details
Main Authors:	Yuting Zhou, Hongjie Wan
Format:	Article
Language:	English
Published:	SpringerOpen 2023-06-01
Series:	EURASIP Journal on Audio, Speech, and Music Processing
Subjects:	Sound event detection and localization Conformer Attention mechanism Multi-task learning Soft parameter sharing
Online Access:	https://doi.org/10.1186/s13636-023-00292-9

_version_	1797789663601098752
author	Yuting Zhou Hongjie Wan
author_facet	Yuting Zhou Hongjie Wan
author_sort	Yuting Zhou
collection	DOAJ
description	Abstract The goal of sound event detection and localization (SELD) is to identify each individual sound event class and its activity time from a piece of audio, while estimating its spatial location at the time of activity. Conformer combines the advantages of convolutional layers and Transformer, which is effective in tasks such as speech recognition. However, it achieves high performance relying on complex network structure and a large number of computations. In the SELD task of this paper, we propose to use an encoder with a simpler network structure, called the dual-branch attention module (DBAM). The module is improved based on the conformer using two parallel branches of attention and convolution, which can model both global and local contextual information. We also blend low-level and high-level features of the localization task. In addition, we add soft parameter sharing to the joint SELD network, which can efficiently exploit the potential relationship between the two subtasks, SED and DOA. The proposed method can effectively detect two sound events with overlapping occurrence in the same time period. We experimented with the open dataset DCASE 2020 task 3 proving that the proposed method achieves better SELD performance than the baseline model. Furthermore, we conducted ablation experiments for verifying the effectiveness of the dual-branch attention module and soft parameter sharing.
first_indexed	2024-03-13T01:53:52Z
format	Article
id	doaj.art-d61bc4c192794481983ba08b224b34df
institution	Directory Open Access Journal
issn	1687-4722
language	English
last_indexed	2024-03-13T01:53:52Z
publishDate	2023-06-01
publisher	SpringerOpen
record_format	Article
series	EURASIP Journal on Audio, Speech, and Music Processing
spelling	doaj.art-d61bc4c192794481983ba08b224b34df2023-07-02T11:22:15ZengSpringerOpenEURASIP Journal on Audio, Speech, and Music Processing1687-47222023-06-012023111510.1186/s13636-023-00292-9Dual-branch attention module-based network with parameter sharing for joint sound event detection and localizationYuting Zhou0Hongjie Wan1Information Engineering Dept, Beijing University of Chemical TechnologyInformation Engineering Dept, Beijing University of Chemical TechnologyAbstract The goal of sound event detection and localization (SELD) is to identify each individual sound event class and its activity time from a piece of audio, while estimating its spatial location at the time of activity. Conformer combines the advantages of convolutional layers and Transformer, which is effective in tasks such as speech recognition. However, it achieves high performance relying on complex network structure and a large number of computations. In the SELD task of this paper, we propose to use an encoder with a simpler network structure, called the dual-branch attention module (DBAM). The module is improved based on the conformer using two parallel branches of attention and convolution, which can model both global and local contextual information. We also blend low-level and high-level features of the localization task. In addition, we add soft parameter sharing to the joint SELD network, which can efficiently exploit the potential relationship between the two subtasks, SED and DOA. The proposed method can effectively detect two sound events with overlapping occurrence in the same time period. We experimented with the open dataset DCASE 2020 task 3 proving that the proposed method achieves better SELD performance than the baseline model. Furthermore, we conducted ablation experiments for verifying the effectiveness of the dual-branch attention module and soft parameter sharing.https://doi.org/10.1186/s13636-023-00292-9Sound event detection and localizationConformerAttention mechanismMulti-task learningSoft parameter sharing
spellingShingle	Yuting Zhou Hongjie Wan Dual-branch attention module-based network with parameter sharing for joint sound event detection and localization EURASIP Journal on Audio, Speech, and Music Processing Sound event detection and localization Conformer Attention mechanism Multi-task learning Soft parameter sharing
title	Dual-branch attention module-based network with parameter sharing for joint sound event detection and localization
title_full	Dual-branch attention module-based network with parameter sharing for joint sound event detection and localization
title_fullStr	Dual-branch attention module-based network with parameter sharing for joint sound event detection and localization
title_full_unstemmed	Dual-branch attention module-based network with parameter sharing for joint sound event detection and localization
title_short	Dual-branch attention module-based network with parameter sharing for joint sound event detection and localization
title_sort	dual branch attention module based network with parameter sharing for joint sound event detection and localization
topic	Sound event detection and localization Conformer Attention mechanism Multi-task learning Soft parameter sharing
url	https://doi.org/10.1186/s13636-023-00292-9
work_keys_str_mv	AT yutingzhou dualbranchattentionmodulebasednetworkwithparametersharingforjointsoundeventdetectionandlocalization AT hongjiewan dualbranchattentionmodulebasednetworkwithparametersharingforjointsoundeventdetectionandlocalization

Dual-branch attention module-based network with parameter sharing for joint sound event detection and localization

Similar Items