A Pre-Separation and All-Neural Beamformer Framework for Multi-Channel Speech Separation

Thanks to the use of deep neural networks (DNNs), microphone array speech separation methods have achieved impressive performance. However, most existing neural beamforming methods explicitly follow traditional beamformer formulas, which possibly causes sub-optimal performance. In this study, a pre-...

Full description

Bibliographic Details
Main Authors: Wupeng Xie, Xiaoxiao Xiang, Xiaojuan Zhang, Guanghong Liu
Format: Article
Language:English
Published: MDPI AG 2023-01-01
Series:Symmetry
Subjects:
Online Access:https://www.mdpi.com/2073-8994/15/2/261
_version_ 1797618115089006592
author Wupeng Xie
Xiaoxiao Xiang
Xiaojuan Zhang
Guanghong Liu
author_facet Wupeng Xie
Xiaoxiao Xiang
Xiaojuan Zhang
Guanghong Liu
author_sort Wupeng Xie
collection DOAJ
description Thanks to the use of deep neural networks (DNNs), microphone array speech separation methods have achieved impressive performance. However, most existing neural beamforming methods explicitly follow traditional beamformer formulas, which possibly causes sub-optimal performance. In this study, a pre-separation and all-neural beamformer framework is proposed for multi-channel speech separation without following the solutions of the conventional beamformers, such as the minimum variance distortionless response (MVDR) beamformer. More specifically, the proposed framework includes two modules, namely the pre-separation module and the all-neural beamforming module. The pre-separation module is used to obtain pre-separated speech and interference, which are further utilized by the all-neural beamforming module to obtain frame-level beamforming weights without computing the spatial covariance matrices. The evaluation results of the multi-channel speech separation tasks, including speech enhancement subtasks and speaker separation subtasks, demonstrate that the proposed method is more effective than several advanced baselines. Furthermore, this method can be used for symmetrical stereo speech.
first_indexed 2024-03-11T08:04:39Z
format Article
id doaj.art-2fbc0031349849bdbe8630925c3734f7
institution Directory Open Access Journal
issn 2073-8994
language English
last_indexed 2024-03-11T08:04:39Z
publishDate 2023-01-01
publisher MDPI AG
record_format Article
series Symmetry
spelling doaj.art-2fbc0031349849bdbe8630925c3734f72023-11-16T23:30:57ZengMDPI AGSymmetry2073-89942023-01-0115226110.3390/sym15020261A Pre-Separation and All-Neural Beamformer Framework for Multi-Channel Speech SeparationWupeng Xie0Xiaoxiao Xiang1Xiaojuan Zhang2Guanghong Liu3Information Science Academy, China Electronics Technology Group Corporation, Beijing 100041, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, ChinaInformation Science Academy, China Electronics Technology Group Corporation, Beijing 100041, ChinaThanks to the use of deep neural networks (DNNs), microphone array speech separation methods have achieved impressive performance. However, most existing neural beamforming methods explicitly follow traditional beamformer formulas, which possibly causes sub-optimal performance. In this study, a pre-separation and all-neural beamformer framework is proposed for multi-channel speech separation without following the solutions of the conventional beamformers, such as the minimum variance distortionless response (MVDR) beamformer. More specifically, the proposed framework includes two modules, namely the pre-separation module and the all-neural beamforming module. The pre-separation module is used to obtain pre-separated speech and interference, which are further utilized by the all-neural beamforming module to obtain frame-level beamforming weights without computing the spatial covariance matrices. The evaluation results of the multi-channel speech separation tasks, including speech enhancement subtasks and speaker separation subtasks, demonstrate that the proposed method is more effective than several advanced baselines. Furthermore, this method can be used for symmetrical stereo speech.https://www.mdpi.com/2073-8994/15/2/261multi-channel speech separationbeamformingpre-separation moduleall-neuralspeech enhancement
spellingShingle Wupeng Xie
Xiaoxiao Xiang
Xiaojuan Zhang
Guanghong Liu
A Pre-Separation and All-Neural Beamformer Framework for Multi-Channel Speech Separation
Symmetry
multi-channel speech separation
beamforming
pre-separation module
all-neural
speech enhancement
title A Pre-Separation and All-Neural Beamformer Framework for Multi-Channel Speech Separation
title_full A Pre-Separation and All-Neural Beamformer Framework for Multi-Channel Speech Separation
title_fullStr A Pre-Separation and All-Neural Beamformer Framework for Multi-Channel Speech Separation
title_full_unstemmed A Pre-Separation and All-Neural Beamformer Framework for Multi-Channel Speech Separation
title_short A Pre-Separation and All-Neural Beamformer Framework for Multi-Channel Speech Separation
title_sort pre separation and all neural beamformer framework for multi channel speech separation
topic multi-channel speech separation
beamforming
pre-separation module
all-neural
speech enhancement
url https://www.mdpi.com/2073-8994/15/2/261
work_keys_str_mv AT wupengxie apreseparationandallneuralbeamformerframeworkformultichannelspeechseparation
AT xiaoxiaoxiang apreseparationandallneuralbeamformerframeworkformultichannelspeechseparation
AT xiaojuanzhang apreseparationandallneuralbeamformerframeworkformultichannelspeechseparation
AT guanghongliu apreseparationandallneuralbeamformerframeworkformultichannelspeechseparation
AT wupengxie preseparationandallneuralbeamformerframeworkformultichannelspeechseparation
AT xiaoxiaoxiang preseparationandallneuralbeamformerframeworkformultichannelspeechseparation
AT xiaojuanzhang preseparationandallneuralbeamformerframeworkformultichannelspeechseparation
AT guanghongliu preseparationandallneuralbeamformerframeworkformultichannelspeechseparation