Graph Convolution-Based Deep Clustering for Speech Separation
Deep clustering is a promising technique for speech separation that is crucial to speech communication, acoustic target detection, acoustic enhancement and speech recognition. In the study of monophonic speech separation, the problem is that the decrease in separation and generalization performance...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9076605/ |
_version_ | 1818857044643414016 |
---|---|
author | Shan Qin Ting Jiang Sheng Wu Ning Wang Xinran Zhao |
author_facet | Shan Qin Ting Jiang Sheng Wu Ning Wang Xinran Zhao |
author_sort | Shan Qin |
collection | DOAJ |
description | Deep clustering is a promising technique for speech separation that is crucial to speech communication, acoustic target detection, acoustic enhancement and speech recognition. In the study of monophonic speech separation, the problem is that the decrease in separation and generalization performance of the model in the case of reducing the variety of the training data set. In this paper, we propose a comprehensive deep clustering framework that construction the structural speech data based on GCN, named graph deep clustering (GDC) to further improve the separation performance of the separation model. In particular, embedding features are transformed into graph-structured data, and the speech separation mask is achieved by clustering these graph-structured data. Graph structural information aggregates nodes within a class, which makes feature representations conducive to clustering. Experimental results demonstrate that the proposed scheme can improve the clustering performance. The SDR of the separated speech is improved by about 1.2 dB, and the clustering accuracy is improved by 15%. We also use the perceptually motivated objective measures for the evaluation of audio source separation to score the speech quality. The target speech quality and the overall perceptual score are improved by 10.7% compared with other speech separation algorithms. |
first_indexed | 2024-12-19T08:34:08Z |
format | Article |
id | doaj.art-b0791a734ea841bba8246434680072b7 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-19T08:34:08Z |
publishDate | 2020-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-b0791a734ea841bba8246434680072b72022-12-21T20:29:06ZengIEEEIEEE Access2169-35362020-01-018825718258010.1109/ACCESS.2020.29898339076605Graph Convolution-Based Deep Clustering for Speech SeparationShan Qin0https://orcid.org/0000-0002-9985-3163Ting Jiang1https://orcid.org/0000-0003-3598-3804Sheng Wu2https://orcid.org/0000-0002-9947-9968Ning Wang3https://orcid.org/0000-0003-1381-7952Xinran Zhao4https://orcid.org/0000-0002-6977-6822Key Laboratory of Universal Wireless Communication, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, ChinaKey Laboratory of Universal Wireless Communication, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, ChinaKey Laboratory of Universal Wireless Communication, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, ChinaDepartment of Electrical and Computer Engineering, George Mason University, Fairfax, VA, USAKey Laboratory of Universal Wireless Communication, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, ChinaDeep clustering is a promising technique for speech separation that is crucial to speech communication, acoustic target detection, acoustic enhancement and speech recognition. In the study of monophonic speech separation, the problem is that the decrease in separation and generalization performance of the model in the case of reducing the variety of the training data set. In this paper, we propose a comprehensive deep clustering framework that construction the structural speech data based on GCN, named graph deep clustering (GDC) to further improve the separation performance of the separation model. In particular, embedding features are transformed into graph-structured data, and the speech separation mask is achieved by clustering these graph-structured data. Graph structural information aggregates nodes within a class, which makes feature representations conducive to clustering. Experimental results demonstrate that the proposed scheme can improve the clustering performance. The SDR of the separated speech is improved by about 1.2 dB, and the clustering accuracy is improved by 15%. We also use the perceptually motivated objective measures for the evaluation of audio source separation to score the speech quality. The target speech quality and the overall perceptual score are improved by 10.7% compared with other speech separation algorithms.https://ieeexplore.ieee.org/document/9076605/Construction of graph-structured datadeep clusteringgraph convolutional filterspeech separation |
spellingShingle | Shan Qin Ting Jiang Sheng Wu Ning Wang Xinran Zhao Graph Convolution-Based Deep Clustering for Speech Separation IEEE Access Construction of graph-structured data deep clustering graph convolutional filter speech separation |
title | Graph Convolution-Based Deep Clustering for Speech Separation |
title_full | Graph Convolution-Based Deep Clustering for Speech Separation |
title_fullStr | Graph Convolution-Based Deep Clustering for Speech Separation |
title_full_unstemmed | Graph Convolution-Based Deep Clustering for Speech Separation |
title_short | Graph Convolution-Based Deep Clustering for Speech Separation |
title_sort | graph convolution based deep clustering for speech separation |
topic | Construction of graph-structured data deep clustering graph convolutional filter speech separation |
url | https://ieeexplore.ieee.org/document/9076605/ |
work_keys_str_mv | AT shanqin graphconvolutionbaseddeepclusteringforspeechseparation AT tingjiang graphconvolutionbaseddeepclusteringforspeechseparation AT shengwu graphconvolutionbaseddeepclusteringforspeechseparation AT ningwang graphconvolutionbaseddeepclusteringforspeechseparation AT xinranzhao graphconvolutionbaseddeepclusteringforspeechseparation |