A Pre-Separation and All-Neural Beamformer Framework for Multi-Channel Speech Separation
Thanks to the use of deep neural networks (DNNs), microphone array speech separation methods have achieved impressive performance. However, most existing neural beamforming methods explicitly follow traditional beamformer formulas, which possibly causes sub-optimal performance. In this study, a pre-...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-01-01
|
Series: | Symmetry |
Subjects: | |
Online Access: | https://www.mdpi.com/2073-8994/15/2/261 |
_version_ | 1797618115089006592 |
---|---|
author | Wupeng Xie Xiaoxiao Xiang Xiaojuan Zhang Guanghong Liu |
author_facet | Wupeng Xie Xiaoxiao Xiang Xiaojuan Zhang Guanghong Liu |
author_sort | Wupeng Xie |
collection | DOAJ |
description | Thanks to the use of deep neural networks (DNNs), microphone array speech separation methods have achieved impressive performance. However, most existing neural beamforming methods explicitly follow traditional beamformer formulas, which possibly causes sub-optimal performance. In this study, a pre-separation and all-neural beamformer framework is proposed for multi-channel speech separation without following the solutions of the conventional beamformers, such as the minimum variance distortionless response (MVDR) beamformer. More specifically, the proposed framework includes two modules, namely the pre-separation module and the all-neural beamforming module. The pre-separation module is used to obtain pre-separated speech and interference, which are further utilized by the all-neural beamforming module to obtain frame-level beamforming weights without computing the spatial covariance matrices. The evaluation results of the multi-channel speech separation tasks, including speech enhancement subtasks and speaker separation subtasks, demonstrate that the proposed method is more effective than several advanced baselines. Furthermore, this method can be used for symmetrical stereo speech. |
first_indexed | 2024-03-11T08:04:39Z |
format | Article |
id | doaj.art-2fbc0031349849bdbe8630925c3734f7 |
institution | Directory Open Access Journal |
issn | 2073-8994 |
language | English |
last_indexed | 2024-03-11T08:04:39Z |
publishDate | 2023-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Symmetry |
spelling | doaj.art-2fbc0031349849bdbe8630925c3734f72023-11-16T23:30:57ZengMDPI AGSymmetry2073-89942023-01-0115226110.3390/sym15020261A Pre-Separation and All-Neural Beamformer Framework for Multi-Channel Speech SeparationWupeng Xie0Xiaoxiao Xiang1Xiaojuan Zhang2Guanghong Liu3Information Science Academy, China Electronics Technology Group Corporation, Beijing 100041, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, ChinaInformation Science Academy, China Electronics Technology Group Corporation, Beijing 100041, ChinaThanks to the use of deep neural networks (DNNs), microphone array speech separation methods have achieved impressive performance. However, most existing neural beamforming methods explicitly follow traditional beamformer formulas, which possibly causes sub-optimal performance. In this study, a pre-separation and all-neural beamformer framework is proposed for multi-channel speech separation without following the solutions of the conventional beamformers, such as the minimum variance distortionless response (MVDR) beamformer. More specifically, the proposed framework includes two modules, namely the pre-separation module and the all-neural beamforming module. The pre-separation module is used to obtain pre-separated speech and interference, which are further utilized by the all-neural beamforming module to obtain frame-level beamforming weights without computing the spatial covariance matrices. The evaluation results of the multi-channel speech separation tasks, including speech enhancement subtasks and speaker separation subtasks, demonstrate that the proposed method is more effective than several advanced baselines. Furthermore, this method can be used for symmetrical stereo speech.https://www.mdpi.com/2073-8994/15/2/261multi-channel speech separationbeamformingpre-separation moduleall-neuralspeech enhancement |
spellingShingle | Wupeng Xie Xiaoxiao Xiang Xiaojuan Zhang Guanghong Liu A Pre-Separation and All-Neural Beamformer Framework for Multi-Channel Speech Separation Symmetry multi-channel speech separation beamforming pre-separation module all-neural speech enhancement |
title | A Pre-Separation and All-Neural Beamformer Framework for Multi-Channel Speech Separation |
title_full | A Pre-Separation and All-Neural Beamformer Framework for Multi-Channel Speech Separation |
title_fullStr | A Pre-Separation and All-Neural Beamformer Framework for Multi-Channel Speech Separation |
title_full_unstemmed | A Pre-Separation and All-Neural Beamformer Framework for Multi-Channel Speech Separation |
title_short | A Pre-Separation and All-Neural Beamformer Framework for Multi-Channel Speech Separation |
title_sort | pre separation and all neural beamformer framework for multi channel speech separation |
topic | multi-channel speech separation beamforming pre-separation module all-neural speech enhancement |
url | https://www.mdpi.com/2073-8994/15/2/261 |
work_keys_str_mv | AT wupengxie apreseparationandallneuralbeamformerframeworkformultichannelspeechseparation AT xiaoxiaoxiang apreseparationandallneuralbeamformerframeworkformultichannelspeechseparation AT xiaojuanzhang apreseparationandallneuralbeamformerframeworkformultichannelspeechseparation AT guanghongliu apreseparationandallneuralbeamformerframeworkformultichannelspeechseparation AT wupengxie preseparationandallneuralbeamformerframeworkformultichannelspeechseparation AT xiaoxiaoxiang preseparationandallneuralbeamformerframeworkformultichannelspeechseparation AT xiaojuanzhang preseparationandallneuralbeamformerframeworkformultichannelspeechseparation AT guanghongliu preseparationandallneuralbeamformerframeworkformultichannelspeechseparation |