Dual-branch attention module-based network with parameter sharing for joint sound event detection and localization
Abstract The goal of sound event detection and localization (SELD) is to identify each individual sound event class and its activity time from a piece of audio, while estimating its spatial location at the time of activity. Conformer combines the advantages of convolutional layers and Transformer, w...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2023-06-01
|
Series: | EURASIP Journal on Audio, Speech, and Music Processing |
Subjects: | |
Online Access: | https://doi.org/10.1186/s13636-023-00292-9 |
_version_ | 1797789663601098752 |
---|---|
author | Yuting Zhou Hongjie Wan |
author_facet | Yuting Zhou Hongjie Wan |
author_sort | Yuting Zhou |
collection | DOAJ |
description | Abstract The goal of sound event detection and localization (SELD) is to identify each individual sound event class and its activity time from a piece of audio, while estimating its spatial location at the time of activity. Conformer combines the advantages of convolutional layers and Transformer, which is effective in tasks such as speech recognition. However, it achieves high performance relying on complex network structure and a large number of computations. In the SELD task of this paper, we propose to use an encoder with a simpler network structure, called the dual-branch attention module (DBAM). The module is improved based on the conformer using two parallel branches of attention and convolution, which can model both global and local contextual information. We also blend low-level and high-level features of the localization task. In addition, we add soft parameter sharing to the joint SELD network, which can efficiently exploit the potential relationship between the two subtasks, SED and DOA. The proposed method can effectively detect two sound events with overlapping occurrence in the same time period. We experimented with the open dataset DCASE 2020 task 3 proving that the proposed method achieves better SELD performance than the baseline model. Furthermore, we conducted ablation experiments for verifying the effectiveness of the dual-branch attention module and soft parameter sharing. |
first_indexed | 2024-03-13T01:53:52Z |
format | Article |
id | doaj.art-d61bc4c192794481983ba08b224b34df |
institution | Directory Open Access Journal |
issn | 1687-4722 |
language | English |
last_indexed | 2024-03-13T01:53:52Z |
publishDate | 2023-06-01 |
publisher | SpringerOpen |
record_format | Article |
series | EURASIP Journal on Audio, Speech, and Music Processing |
spelling | doaj.art-d61bc4c192794481983ba08b224b34df2023-07-02T11:22:15ZengSpringerOpenEURASIP Journal on Audio, Speech, and Music Processing1687-47222023-06-012023111510.1186/s13636-023-00292-9Dual-branch attention module-based network with parameter sharing for joint sound event detection and localizationYuting Zhou0Hongjie Wan1Information Engineering Dept, Beijing University of Chemical TechnologyInformation Engineering Dept, Beijing University of Chemical TechnologyAbstract The goal of sound event detection and localization (SELD) is to identify each individual sound event class and its activity time from a piece of audio, while estimating its spatial location at the time of activity. Conformer combines the advantages of convolutional layers and Transformer, which is effective in tasks such as speech recognition. However, it achieves high performance relying on complex network structure and a large number of computations. In the SELD task of this paper, we propose to use an encoder with a simpler network structure, called the dual-branch attention module (DBAM). The module is improved based on the conformer using two parallel branches of attention and convolution, which can model both global and local contextual information. We also blend low-level and high-level features of the localization task. In addition, we add soft parameter sharing to the joint SELD network, which can efficiently exploit the potential relationship between the two subtasks, SED and DOA. The proposed method can effectively detect two sound events with overlapping occurrence in the same time period. We experimented with the open dataset DCASE 2020 task 3 proving that the proposed method achieves better SELD performance than the baseline model. Furthermore, we conducted ablation experiments for verifying the effectiveness of the dual-branch attention module and soft parameter sharing.https://doi.org/10.1186/s13636-023-00292-9Sound event detection and localizationConformerAttention mechanismMulti-task learningSoft parameter sharing |
spellingShingle | Yuting Zhou Hongjie Wan Dual-branch attention module-based network with parameter sharing for joint sound event detection and localization EURASIP Journal on Audio, Speech, and Music Processing Sound event detection and localization Conformer Attention mechanism Multi-task learning Soft parameter sharing |
title | Dual-branch attention module-based network with parameter sharing for joint sound event detection and localization |
title_full | Dual-branch attention module-based network with parameter sharing for joint sound event detection and localization |
title_fullStr | Dual-branch attention module-based network with parameter sharing for joint sound event detection and localization |
title_full_unstemmed | Dual-branch attention module-based network with parameter sharing for joint sound event detection and localization |
title_short | Dual-branch attention module-based network with parameter sharing for joint sound event detection and localization |
title_sort | dual branch attention module based network with parameter sharing for joint sound event detection and localization |
topic | Sound event detection and localization Conformer Attention mechanism Multi-task learning Soft parameter sharing |
url | https://doi.org/10.1186/s13636-023-00292-9 |
work_keys_str_mv | AT yutingzhou dualbranchattentionmodulebasednetworkwithparametersharingforjointsoundeventdetectionandlocalization AT hongjiewan dualbranchattentionmodulebasednetworkwithparametersharingforjointsoundeventdetectionandlocalization |