Improving Impulse Audio Source Separation using Generative Adversarial Networks for Phase Generation

This thesis explored separating impulse noise from a desired signal, for the purposes of hearing protection for soldiers and musicians. An evaluation of current techniques in source separation, such as matrix demixing methods (Independent Component Analysis, Independent Vector Analysis), and masking...

Full description

Bibliographic Details
Main Author:	Piercy, Phoebe K.
Other Authors:	Lang, Jeffrey H.
Format:	Thesis
Published:	Massachusetts Institute of Technology 2022
Online Access:	https://hdl.handle.net/1721.1/138956

_version_	1811086806654910464
author	Piercy, Phoebe K.
author2	Lang, Jeffrey H.
author_facet	Lang, Jeffrey H. Piercy, Phoebe K.
author_sort	Piercy, Phoebe K.
collection	MIT
description	This thesis explored separating impulse noise from a desired signal, for the purposes of hearing protection for soldiers and musicians. An evaluation of current techniques in source separation, such as matrix demixing methods (Independent Component Analysis, Independent Vector Analysis), and masking methods (Ideal Ratio Mask, Ideal Binary Mask), amongst others, concluded that Time-Frequency masking of the noisy signal spectrogram was the best candidate audio separation method for dynamic soundscapes such as tactical fields and music. We followed with an experimental investigation of the role of phase in Time-Frequency masking, finding its importance to the intelligibility of speech to be paramount. In particular, the construction of a Complex Ideal Ratio Mask (cIRM), altering both magnitude and phase information in the spectrogram, was identified as the most promising method of impulse source separation, with separated speech intelligibility comparable to clean speech. This motivated us to develop a method to generate an approximation of the cIRM, but without prior source information. As such, the growing use of neural networks as a tool in source separation and phase estimation was presented and evaluated. Experiments were conducted to evaluate the potential of Generative Adversarial Networks (GANs), often used in image transformation, in generating the phase of the cIRM, with human test subjects to evaluate whether intelligibility of separated speech was improved. The GAN showed promise in generating phase-like results, although imperfect transformation resulted in an audible quality decrease, suggesting that the approach was unlikely to produce the natural sound required by musicians. However, for the tactical case, where intelligibility is valued over quality, consonant reconstruction and improved impulse attenuation was observed using our GAN-estimated cIRM. This improvement was reflected in an increase in the signal to noise ratio as compared to clean speech, and a decrease in the same metric compared to the impulse noise, demonstrating the improved clean speech contributions, and the reduction in impulse noise contributions in the separated output. These results show the potential, with better resources, for GAN-generated phase to be used to improve intelligibility during audio source separation of impulse noise from speech, and motivates further exploration on this topic.
first_indexed	2024-09-23T13:34:58Z
format	Thesis
id	mit-1721.1/138956
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T13:34:58Z
publishDate	2022
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1389562022-01-15T03:26:16Z Improving Impulse Audio Source Separation using Generative Adversarial Networks for Phase Generation Piercy, Phoebe K. Lang, Jeffrey H. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science This thesis explored separating impulse noise from a desired signal, for the purposes of hearing protection for soldiers and musicians. An evaluation of current techniques in source separation, such as matrix demixing methods (Independent Component Analysis, Independent Vector Analysis), and masking methods (Ideal Ratio Mask, Ideal Binary Mask), amongst others, concluded that Time-Frequency masking of the noisy signal spectrogram was the best candidate audio separation method for dynamic soundscapes such as tactical fields and music. We followed with an experimental investigation of the role of phase in Time-Frequency masking, finding its importance to the intelligibility of speech to be paramount. In particular, the construction of a Complex Ideal Ratio Mask (cIRM), altering both magnitude and phase information in the spectrogram, was identified as the most promising method of impulse source separation, with separated speech intelligibility comparable to clean speech. This motivated us to develop a method to generate an approximation of the cIRM, but without prior source information. As such, the growing use of neural networks as a tool in source separation and phase estimation was presented and evaluated. Experiments were conducted to evaluate the potential of Generative Adversarial Networks (GANs), often used in image transformation, in generating the phase of the cIRM, with human test subjects to evaluate whether intelligibility of separated speech was improved. The GAN showed promise in generating phase-like results, although imperfect transformation resulted in an audible quality decrease, suggesting that the approach was unlikely to produce the natural sound required by musicians. However, for the tactical case, where intelligibility is valued over quality, consonant reconstruction and improved impulse attenuation was observed using our GAN-estimated cIRM. This improvement was reflected in an increase in the signal to noise ratio as compared to clean speech, and a decrease in the same metric compared to the impulse noise, demonstrating the improved clean speech contributions, and the reduction in impulse noise contributions in the separated output. These results show the potential, with better resources, for GAN-generated phase to be used to improve intelligibility during audio source separation of impulse noise from speech, and motivates further exploration on this topic. M.Eng. 2022-01-14T14:40:41Z 2022-01-14T14:40:41Z 2021-06 2021-06-17T20:14:04.111Z Thesis https://hdl.handle.net/1721.1/138956 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Piercy, Phoebe K. Improving Impulse Audio Source Separation using Generative Adversarial Networks for Phase Generation
title	Improving Impulse Audio Source Separation using Generative Adversarial Networks for Phase Generation
title_full	Improving Impulse Audio Source Separation using Generative Adversarial Networks for Phase Generation
title_fullStr	Improving Impulse Audio Source Separation using Generative Adversarial Networks for Phase Generation
title_full_unstemmed	Improving Impulse Audio Source Separation using Generative Adversarial Networks for Phase Generation
title_short	Improving Impulse Audio Source Separation using Generative Adversarial Networks for Phase Generation
title_sort	improving impulse audio source separation using generative adversarial networks for phase generation
url	https://hdl.handle.net/1721.1/138956
work_keys_str_mv	AT piercyphoebek improvingimpulseaudiosourceseparationusinggenerativeadversarialnetworksforphasegeneration

Improving Impulse Audio Source Separation using Generative Adversarial Networks for Phase Generation

Similar Items