Blind Source Separation in Polyphonic Music Recordings Using Deep Neural Networks Trained via Policy Gradients

We propose a method for the blind separation of sounds of musical instruments in audio signals. We describe the individual tones via a parametric model, training a dictionary to capture the relative amplitudes of the harmonics. The model parameters are predicted via a U-Net, which is a type of deep...

Full description

Bibliographic Details
Main Authors:	Sören Schulze, Johannes Leuschner, Emily J. King
Format:	Article
Language:	English
Published:	MDPI AG 2021-10-01
Series:	Signals
Subjects:	blind source separation policy gradient neural network dictionary learning parametric model unsupervised learning
Online Access:	https://www.mdpi.com/2624-6120/2/4/39

_version_	1827669597319331840
author	Sören Schulze Johannes Leuschner Emily J. King
author_facet	Sören Schulze Johannes Leuschner Emily J. King
author_sort	Sören Schulze
collection	DOAJ
description	We propose a method for the blind separation of sounds of musical instruments in audio signals. We describe the individual tones via a parametric model, training a dictionary to capture the relative amplitudes of the harmonics. The model parameters are predicted via a U-Net, which is a type of deep neural network. The network is trained without ground truth information, based on the difference between the model prediction and the individual time frames of the short-time Fourier transform. Since some of the model parameters do not yield a useful backpropagation gradient, we model them stochastically and employ the policy gradient instead. To provide phase information and account for inaccuracies in the dictionary-based representation, we also let the network output a direct prediction, which we then use to resynthesize the audio signals for the individual instruments. Due to the flexibility of the neural network, inharmonicity can be incorporated seamlessly and no preprocessing of the input spectra is required. Our algorithm yields high-quality separation results with particularly low interference on a variety of different audio samples, both acoustic and synthetic, provided that the sample contains enough data for the training and that the spectral characteristics of the musical instruments are sufficiently stable to be approximated by the dictionary.
first_indexed	2024-03-10T03:05:27Z
format	Article
id	doaj.art-198962f0d8e84b38bb5d05c86cfe40d8
institution	Directory Open Access Journal
issn	2624-6120
language	English
last_indexed	2024-03-10T03:05:27Z
publishDate	2021-10-01
publisher	MDPI AG
record_format	Article
series	Signals
spelling	doaj.art-198962f0d8e84b38bb5d05c86cfe40d82023-11-23T10:32:59ZengMDPI AGSignals2624-61202021-10-012463766110.3390/signals2040039Blind Source Separation in Polyphonic Music Recordings Using Deep Neural Networks Trained via Policy GradientsSören Schulze0Johannes Leuschner1Emily J. King2Center for Industrial Mathematics, University of Bremen, Bibliothekstr. 5, 28359 Bremen, GermanyCenter for Industrial Mathematics, University of Bremen, Bibliothekstr. 5, 28359 Bremen, GermanyMathematics Department, Colorado State University, 1874 Campus Delivery, 111 Weber Bldg, Fort Collins, CO 80523, USAWe propose a method for the blind separation of sounds of musical instruments in audio signals. We describe the individual tones via a parametric model, training a dictionary to capture the relative amplitudes of the harmonics. The model parameters are predicted via a U-Net, which is a type of deep neural network. The network is trained without ground truth information, based on the difference between the model prediction and the individual time frames of the short-time Fourier transform. Since some of the model parameters do not yield a useful backpropagation gradient, we model them stochastically and employ the policy gradient instead. To provide phase information and account for inaccuracies in the dictionary-based representation, we also let the network output a direct prediction, which we then use to resynthesize the audio signals for the individual instruments. Due to the flexibility of the neural network, inharmonicity can be incorporated seamlessly and no preprocessing of the input spectra is required. Our algorithm yields high-quality separation results with particularly low interference on a variety of different audio samples, both acoustic and synthetic, provided that the sample contains enough data for the training and that the spectral characteristics of the musical instruments are sufficiently stable to be approximated by the dictionary.https://www.mdpi.com/2624-6120/2/4/39blind source separationpolicy gradientneural networkdictionary learningparametric modelunsupervised learning
spellingShingle	Sören Schulze Johannes Leuschner Emily J. King Blind Source Separation in Polyphonic Music Recordings Using Deep Neural Networks Trained via Policy Gradients Signals blind source separation policy gradient neural network dictionary learning parametric model unsupervised learning
title	Blind Source Separation in Polyphonic Music Recordings Using Deep Neural Networks Trained via Policy Gradients
title_full	Blind Source Separation in Polyphonic Music Recordings Using Deep Neural Networks Trained via Policy Gradients
title_fullStr	Blind Source Separation in Polyphonic Music Recordings Using Deep Neural Networks Trained via Policy Gradients
title_full_unstemmed	Blind Source Separation in Polyphonic Music Recordings Using Deep Neural Networks Trained via Policy Gradients
title_short	Blind Source Separation in Polyphonic Music Recordings Using Deep Neural Networks Trained via Policy Gradients
title_sort	blind source separation in polyphonic music recordings using deep neural networks trained via policy gradients
topic	blind source separation policy gradient neural network dictionary learning parametric model unsupervised learning
url	https://www.mdpi.com/2624-6120/2/4/39
work_keys_str_mv	AT sorenschulze blindsourceseparationinpolyphonicmusicrecordingsusingdeepneuralnetworkstrainedviapolicygradients AT johannesleuschner blindsourceseparationinpolyphonicmusicrecordingsusingdeepneuralnetworkstrainedviapolicygradients AT emilyjking blindsourceseparationinpolyphonicmusicrecordingsusingdeepneuralnetworkstrainedviapolicygradients

Blind Source Separation in Polyphonic Music Recordings Using Deep Neural Networks Trained via Policy Gradients

Similar Items