Enhancing Target Speech Based on Nonlinear Soft Masking Using a Single Acoustic Vector Sensor

Enhancing speech captured by distant microphones is a challenging task. In this study, we investigate the multichannel signal properties of the single acoustic vector sensor (AVS) to obtain the inter-sensor data ratio (ISDR) model in the time-frequency (TF) domain. Then, the monotone functions descr...

Full description

Bibliographic Details
Main Authors:	Yuexian Zou, Zhaoyi Liu, Christian H. Ritz
Format:	Article
Language:	English
Published:	MDPI AG 2018-08-01
Series:	Applied Sciences
Subjects:	Direction of Arrival (DOA) time-frequency (TF) mask speech sparsity speech enhancement (SE) acoustic vector sensor (AVS) intelligent service robot
Online Access:	http://www.mdpi.com/2076-3417/8/9/1436

_version_	1818806555485667328
author	Yuexian Zou Zhaoyi Liu Christian H. Ritz
author_facet	Yuexian Zou Zhaoyi Liu Christian H. Ritz
author_sort	Yuexian Zou
collection	DOAJ
description	Enhancing speech captured by distant microphones is a challenging task. In this study, we investigate the multichannel signal properties of the single acoustic vector sensor (AVS) to obtain the inter-sensor data ratio (ISDR) model in the time-frequency (TF) domain. Then, the monotone functions describing the relationship between the ISDRs and the direction of arrival (DOA) of the target speaker are derived. For the target speech enhancement (SE) task, the DOA of the target speaker is given, and the ISDRs are calculated. Hence, the TF components dominated by the target speech are extracted with high probability using the established monotone functions, and then, a nonlinear soft mask of the target speech is generated. As a result, a masking-based speech enhancement method is developed, which is termed the AVS-SMASK method. Extensive experiments with simulated data and recorded data have been carried out to validate the effectiveness of our proposed AVS-SMASK method in terms of suppressing spatial speech interferences and reducing the adverse impact of the additive background noise while maintaining less speech distortion. Moreover, our AVS-SMASK method is computationally inexpensive, and the AVS is of a small physical size. These merits are favorable to many applications, such as robot auditory systems.
first_indexed	2024-12-18T19:11:38Z
format	Article
id	doaj.art-f6e7ad8506a04e2fa07a508d4cff3b61
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-12-18T19:11:38Z
publishDate	2018-08-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-f6e7ad8506a04e2fa07a508d4cff3b612022-12-21T20:56:15ZengMDPI AGApplied Sciences2076-34172018-08-0189143610.3390/app8091436app8091436Enhancing Target Speech Based on Nonlinear Soft Masking Using a Single Acoustic Vector SensorYuexian Zou0Zhaoyi Liu1Christian H. Ritz2ADSPLAB, School of Electronic Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen 518055, ChinaADSPLAB, School of Electronic Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen 518055, ChinaSchool of Electrical, Computer, and Telecommunications Engineering, University of Wollongong, Wollongong, NSW 2500, AustraliaEnhancing speech captured by distant microphones is a challenging task. In this study, we investigate the multichannel signal properties of the single acoustic vector sensor (AVS) to obtain the inter-sensor data ratio (ISDR) model in the time-frequency (TF) domain. Then, the monotone functions describing the relationship between the ISDRs and the direction of arrival (DOA) of the target speaker are derived. For the target speech enhancement (SE) task, the DOA of the target speaker is given, and the ISDRs are calculated. Hence, the TF components dominated by the target speech are extracted with high probability using the established monotone functions, and then, a nonlinear soft mask of the target speech is generated. As a result, a masking-based speech enhancement method is developed, which is termed the AVS-SMASK method. Extensive experiments with simulated data and recorded data have been carried out to validate the effectiveness of our proposed AVS-SMASK method in terms of suppressing spatial speech interferences and reducing the adverse impact of the additive background noise while maintaining less speech distortion. Moreover, our AVS-SMASK method is computationally inexpensive, and the AVS is of a small physical size. These merits are favorable to many applications, such as robot auditory systems.http://www.mdpi.com/2076-3417/8/9/1436Direction of Arrival (DOA)time-frequency (TF) maskspeech sparsityspeech enhancement (SE)acoustic vector sensor (AVS)intelligent service robot
spellingShingle	Yuexian Zou Zhaoyi Liu Christian H. Ritz Enhancing Target Speech Based on Nonlinear Soft Masking Using a Single Acoustic Vector Sensor Applied Sciences Direction of Arrival (DOA) time-frequency (TF) mask speech sparsity speech enhancement (SE) acoustic vector sensor (AVS) intelligent service robot
title	Enhancing Target Speech Based on Nonlinear Soft Masking Using a Single Acoustic Vector Sensor
title_full	Enhancing Target Speech Based on Nonlinear Soft Masking Using a Single Acoustic Vector Sensor
title_fullStr	Enhancing Target Speech Based on Nonlinear Soft Masking Using a Single Acoustic Vector Sensor
title_full_unstemmed	Enhancing Target Speech Based on Nonlinear Soft Masking Using a Single Acoustic Vector Sensor
title_short	Enhancing Target Speech Based on Nonlinear Soft Masking Using a Single Acoustic Vector Sensor
title_sort	enhancing target speech based on nonlinear soft masking using a single acoustic vector sensor
topic	Direction of Arrival (DOA) time-frequency (TF) mask speech sparsity speech enhancement (SE) acoustic vector sensor (AVS) intelligent service robot
url	http://www.mdpi.com/2076-3417/8/9/1436
work_keys_str_mv	AT yuexianzou enhancingtargetspeechbasedonnonlinearsoftmaskingusingasingleacousticvectorsensor AT zhaoyiliu enhancingtargetspeechbasedonnonlinearsoftmaskingusingasingleacousticvectorsensor AT christianhritz enhancingtargetspeechbasedonnonlinearsoftmaskingusingasingleacousticvectorsensor

Enhancing Target Speech Based on Nonlinear Soft Masking Using a Single Acoustic Vector Sensor

Similar Items