3 directional Inception-ResUNet: Deep spatial feature learning for multichannel singing voice separation with distortion.

Singing voice separation on robots faces the problem of interpreting ambiguous auditory signals. The acoustic signal, which the humanoid robot perceives through its onboard microphones, is a mixture of singing voice, music, and noise, with distortion, attenuation, and reverberation. In this paper, w...

Full description

Bibliographic Details
Main Authors:	DaDong Wang, Jie Wang, MingChen Sun
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2024-01-01
Series:	PLoS ONE
Online Access:	https://doi.org/10.1371/journal.pone.0289453

_version_	1797326200811552768
author	DaDong Wang Jie Wang MingChen Sun
author_facet	DaDong Wang Jie Wang MingChen Sun
author_sort	DaDong Wang
collection	DOAJ
description	Singing voice separation on robots faces the problem of interpreting ambiguous auditory signals. The acoustic signal, which the humanoid robot perceives through its onboard microphones, is a mixture of singing voice, music, and noise, with distortion, attenuation, and reverberation. In this paper, we used the 3D Inception-ResUNet structure in the U-shaped encoding and decoding network to improve the utilization of the spatial and spectral information of the spectrogram. Multiobjectives were used to train the model: magnitude consistency loss, phase consistency loss, and magnitude correlation consistency loss. We recorded the singing voice and accompaniment derived from the MIR-1K dataset with NAO robots and synthesized the 10-channel dataset for training the model. The experimental results show that the proposed model trained by multiple objectives reaches an average NSDR of 11.55 dB on the test dataset, which outperforms the comparison model.
first_indexed	2024-03-08T06:20:01Z
format	Article
id	doaj.art-0b796784f8df46eea81cde8f09ac923c
institution	Directory Open Access Journal
issn	1932-6203
language	English
last_indexed	2024-03-08T06:20:01Z
publishDate	2024-01-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS ONE
spelling	doaj.art-0b796784f8df46eea81cde8f09ac923c2024-02-04T05:31:22ZengPublic Library of Science (PLoS)PLoS ONE1932-62032024-01-01191e028945310.1371/journal.pone.02894533 directional Inception-ResUNet: Deep spatial feature learning for multichannel singing voice separation with distortion.DaDong WangJie WangMingChen SunSinging voice separation on robots faces the problem of interpreting ambiguous auditory signals. The acoustic signal, which the humanoid robot perceives through its onboard microphones, is a mixture of singing voice, music, and noise, with distortion, attenuation, and reverberation. In this paper, we used the 3D Inception-ResUNet structure in the U-shaped encoding and decoding network to improve the utilization of the spatial and spectral information of the spectrogram. Multiobjectives were used to train the model: magnitude consistency loss, phase consistency loss, and magnitude correlation consistency loss. We recorded the singing voice and accompaniment derived from the MIR-1K dataset with NAO robots and synthesized the 10-channel dataset for training the model. The experimental results show that the proposed model trained by multiple objectives reaches an average NSDR of 11.55 dB on the test dataset, which outperforms the comparison model.https://doi.org/10.1371/journal.pone.0289453
spellingShingle	DaDong Wang Jie Wang MingChen Sun 3 directional Inception-ResUNet: Deep spatial feature learning for multichannel singing voice separation with distortion. PLoS ONE
title	3 directional Inception-ResUNet: Deep spatial feature learning for multichannel singing voice separation with distortion.
title_full	3 directional Inception-ResUNet: Deep spatial feature learning for multichannel singing voice separation with distortion.
title_fullStr	3 directional Inception-ResUNet: Deep spatial feature learning for multichannel singing voice separation with distortion.
title_full_unstemmed	3 directional Inception-ResUNet: Deep spatial feature learning for multichannel singing voice separation with distortion.
title_short	3 directional Inception-ResUNet: Deep spatial feature learning for multichannel singing voice separation with distortion.
title_sort	3 directional inception resunet deep spatial feature learning for multichannel singing voice separation with distortion
url	https://doi.org/10.1371/journal.pone.0289453
work_keys_str_mv	AT dadongwang 3directionalinceptionresunetdeepspatialfeaturelearningformultichannelsingingvoiceseparationwithdistortion AT jiewang 3directionalinceptionresunetdeepspatialfeaturelearningformultichannelsingingvoiceseparationwithdistortion AT mingchensun 3directionalinceptionresunetdeepspatialfeaturelearningformultichannelsingingvoiceseparationwithdistortion

3 directional Inception-ResUNet: Deep spatial feature learning for multichannel singing voice separation with distortion.

Similar Items