Investigation of Machine Learning Model Flexibility for Automatic Application of Reverberation Effect on Audio Signal

This paper discusses an algorithm that attempts to automatically calculate the effect of room reverberation by training a mathematical model based on a recurrent neural network on anechoic and reverberant sound samples. Modelling the room impulse response (RIR) recorded at a 44.1 kHz sampling rate u...

Full description

Bibliographic Details
Main Authors: Mantas Tamulionis, Tomyslav Sledevič, Artūras Serackis
Format: Article
Language:English
Published: MDPI AG 2023-05-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/13/9/5604
_version_ 1827743226075807744
author Mantas Tamulionis
Tomyslav Sledevič
Artūras Serackis
author_facet Mantas Tamulionis
Tomyslav Sledevič
Artūras Serackis
author_sort Mantas Tamulionis
collection DOAJ
description This paper discusses an algorithm that attempts to automatically calculate the effect of room reverberation by training a mathematical model based on a recurrent neural network on anechoic and reverberant sound samples. Modelling the room impulse response (RIR) recorded at a 44.1 kHz sampling rate using a system identification-based approach in the time domain, even with deep learning models, is prohibitively complex and it is almost impossible to automatically learn the parameters of the model for a reverberation time longer than 1 s. Therefore, this paper presents a method to model a reverberated audio signal in the frequency domain. To reduce complexity, the spectrum is analyzed on a logarithmic scale, based on the subjective characteristics of human hearing, by calculating 10 octaves in the range 20–20,000 Hz and dividing each octave by 1/3 or 1/12 of the bandwidth. This maintains equal resolution at high, mid, and low frequencies. The study examines three different recurrent network structures: LSTM, BiLSTM, and GRU, comparing the different sizes of the two hidden layers. The experimental study was carried out to compare the modelling when each octave of the spectrum is divided into a different number of bands, as well as to assess the feasibility of using a single model to predict the spectrum of a reverberated audio in adjacent frequency bands. The paper also presents and describes in detail a new RIR dataset that, although synthetic, is calibrated with recorded impulses.
first_indexed 2024-03-11T04:23:48Z
format Article
id doaj.art-bae4c5a0291447608cc142dc4b23626d
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-11T04:23:48Z
publishDate 2023-05-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-bae4c5a0291447608cc142dc4b23626d2023-11-17T22:36:19ZengMDPI AGApplied Sciences2076-34172023-05-01139560410.3390/app13095604Investigation of Machine Learning Model Flexibility for Automatic Application of Reverberation Effect on Audio SignalMantas Tamulionis0Tomyslav Sledevič1Artūras Serackis2Department of Electronic Systems, Vilnius Gediminas Technical University (VILNIUS TECH), Plytinės g. 25, LT-10105 Vilnius, LithuaniaDepartment of Electronic Systems, Vilnius Gediminas Technical University (VILNIUS TECH), Plytinės g. 25, LT-10105 Vilnius, LithuaniaDepartment of Electronic Systems, Vilnius Gediminas Technical University (VILNIUS TECH), Plytinės g. 25, LT-10105 Vilnius, LithuaniaThis paper discusses an algorithm that attempts to automatically calculate the effect of room reverberation by training a mathematical model based on a recurrent neural network on anechoic and reverberant sound samples. Modelling the room impulse response (RIR) recorded at a 44.1 kHz sampling rate using a system identification-based approach in the time domain, even with deep learning models, is prohibitively complex and it is almost impossible to automatically learn the parameters of the model for a reverberation time longer than 1 s. Therefore, this paper presents a method to model a reverberated audio signal in the frequency domain. To reduce complexity, the spectrum is analyzed on a logarithmic scale, based on the subjective characteristics of human hearing, by calculating 10 octaves in the range 20–20,000 Hz and dividing each octave by 1/3 or 1/12 of the bandwidth. This maintains equal resolution at high, mid, and low frequencies. The study examines three different recurrent network structures: LSTM, BiLSTM, and GRU, comparing the different sizes of the two hidden layers. The experimental study was carried out to compare the modelling when each octave of the spectrum is divided into a different number of bands, as well as to assess the feasibility of using a single model to predict the spectrum of a reverberated audio in adjacent frequency bands. The paper also presents and describes in detail a new RIR dataset that, although synthetic, is calibrated with recorded impulses.https://www.mdpi.com/2076-3417/13/9/5604room reverberationroom impulse responserecurrent neural networksaudio signal spectrumfilter bank
spellingShingle Mantas Tamulionis
Tomyslav Sledevič
Artūras Serackis
Investigation of Machine Learning Model Flexibility for Automatic Application of Reverberation Effect on Audio Signal
Applied Sciences
room reverberation
room impulse response
recurrent neural networks
audio signal spectrum
filter bank
title Investigation of Machine Learning Model Flexibility for Automatic Application of Reverberation Effect on Audio Signal
title_full Investigation of Machine Learning Model Flexibility for Automatic Application of Reverberation Effect on Audio Signal
title_fullStr Investigation of Machine Learning Model Flexibility for Automatic Application of Reverberation Effect on Audio Signal
title_full_unstemmed Investigation of Machine Learning Model Flexibility for Automatic Application of Reverberation Effect on Audio Signal
title_short Investigation of Machine Learning Model Flexibility for Automatic Application of Reverberation Effect on Audio Signal
title_sort investigation of machine learning model flexibility for automatic application of reverberation effect on audio signal
topic room reverberation
room impulse response
recurrent neural networks
audio signal spectrum
filter bank
url https://www.mdpi.com/2076-3417/13/9/5604
work_keys_str_mv AT mantastamulionis investigationofmachinelearningmodelflexibilityforautomaticapplicationofreverberationeffectonaudiosignal
AT tomyslavsledevic investigationofmachinelearningmodelflexibilityforautomaticapplicationofreverberationeffectonaudiosignal
AT arturasserackis investigationofmachinelearningmodelflexibilityforautomaticapplicationofreverberationeffectonaudiosignal