Graph Convolution-Based Deep Clustering for Speech Separation

Deep clustering is a promising technique for speech separation that is crucial to speech communication, acoustic target detection, acoustic enhancement and speech recognition. In the study of monophonic speech separation, the problem is that the decrease in separation and generalization performance...

Full description

Bibliographic Details
Main Authors: Shan Qin, Ting Jiang, Sheng Wu, Ning Wang, Xinran Zhao
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9076605/
Description
Summary:Deep clustering is a promising technique for speech separation that is crucial to speech communication, acoustic target detection, acoustic enhancement and speech recognition. In the study of monophonic speech separation, the problem is that the decrease in separation and generalization performance of the model in the case of reducing the variety of the training data set. In this paper, we propose a comprehensive deep clustering framework that construction the structural speech data based on GCN, named graph deep clustering (GDC) to further improve the separation performance of the separation model. In particular, embedding features are transformed into graph-structured data, and the speech separation mask is achieved by clustering these graph-structured data. Graph structural information aggregates nodes within a class, which makes feature representations conducive to clustering. Experimental results demonstrate that the proposed scheme can improve the clustering performance. The SDR of the separated speech is improved by about 1.2 dB, and the clustering accuracy is improved by 15%. We also use the perceptually motivated objective measures for the evaluation of audio source separation to score the speech quality. The target speech quality and the overall perceptual score are improved by 10.7% compared with other speech separation algorithms.
ISSN:2169-3536