A Survey of Audio Enhancement Algorithms for Music, Speech, Bioacoustics, Biomedical, Industrial, and Environmental Sounds by Image U-Net

The recent surge in the use of Deep Neural Networks (DNNs) has also made its mark in the field of Audio Enhancement (AE), providing much better quality than the classical methods. Although, there are dedicated audio processing DNNs, yet, many recent models of AE have utilized U-Net: a DNN based on C...

Full description

Bibliographic Details
Main Authors:	Sania Gul, Muhammad Salman Khan
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	CNNs image processing deep neural networks pre-trained networks spectrogram U-Net
Online Access:	https://ieeexplore.ieee.org/document/10371226/

_version_	1827389350921371648
author	Sania Gul Muhammad Salman Khan
author_facet	Sania Gul Muhammad Salman Khan
author_sort	Sania Gul
collection	DOAJ
description	The recent surge in the use of Deep Neural Networks (DNNs) has also made its mark in the field of Audio Enhancement (AE), providing much better quality than the classical methods. Although, there are dedicated audio processing DNNs, yet, many recent models of AE have utilized U-Net: a DNN based on Convolutional Neural Network (CNN), fundamentally developed for image segmentation. It is found that the useful features hidden in the time domain are highlighted when the audio signal is converted to a spectrogram, which can be treated as an image. In this article, we will review the recent work, utilizing U-Nets for different AE applications. Different than other published reviews, this review focuses entirely on AE techniques based on image U-Nets. We will discuss the need for AE, U-Net comparison to other DNNs, the benefits of converting the audio to 2D, input representations that are useful for different AE applications, the architecture of vanilla U-Net and the pre-trained models, variations in vanilla architecture incorporated in different E models, and the state-of-the-art AE algorithms based on U-Net in various applications. Apart from speech and music, this article discusses a wide range of audio signals e.g. environmental, biomedical, bioacoustics, and industrial sounds, not covered collectively in a single article in previously published studies. The article ends with the discussion of colored spectrograms in future AE applications.
first_indexed	2024-03-08T16:33:26Z
format	Article
id	doaj.art-966df5c1e3754f17850621b1ae48408c
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-03-08T16:33:26Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-966df5c1e3754f17850621b1ae48408c2024-01-06T00:01:07ZengIEEEIEEE Access2169-35362023-01-011114445614448310.1109/ACCESS.2023.334481310371226A Survey of Audio Enhancement Algorithms for Music, Speech, Bioacoustics, Biomedical, Industrial, and Environmental Sounds by Image U-NetSania Gul0https://orcid.org/0000-0003-4751-2997Muhammad Salman Khan1https://orcid.org/0000-0001-9709-8179Department of Electrical Engineering, University of Engineering and Technology, Peshawar, Peshawar, PakistanDepartment of Electrical Engineering, College of Engineering, Qatar University, Doha, QatarThe recent surge in the use of Deep Neural Networks (DNNs) has also made its mark in the field of Audio Enhancement (AE), providing much better quality than the classical methods. Although, there are dedicated audio processing DNNs, yet, many recent models of AE have utilized U-Net: a DNN based on Convolutional Neural Network (CNN), fundamentally developed for image segmentation. It is found that the useful features hidden in the time domain are highlighted when the audio signal is converted to a spectrogram, which can be treated as an image. In this article, we will review the recent work, utilizing U-Nets for different AE applications. Different than other published reviews, this review focuses entirely on AE techniques based on image U-Nets. We will discuss the need for AE, U-Net comparison to other DNNs, the benefits of converting the audio to 2D, input representations that are useful for different AE applications, the architecture of vanilla U-Net and the pre-trained models, variations in vanilla architecture incorporated in different E models, and the state-of-the-art AE algorithms based on U-Net in various applications. Apart from speech and music, this article discusses a wide range of audio signals e.g. environmental, biomedical, bioacoustics, and industrial sounds, not covered collectively in a single article in previously published studies. The article ends with the discussion of colored spectrograms in future AE applications.https://ieeexplore.ieee.org/document/10371226/CNNsimage processing deep neural networkspre-trained networksspectrogramU-Net
spellingShingle	Sania Gul Muhammad Salman Khan A Survey of Audio Enhancement Algorithms for Music, Speech, Bioacoustics, Biomedical, Industrial, and Environmental Sounds by Image U-Net IEEE Access CNNs image processing deep neural networks pre-trained networks spectrogram U-Net
title	A Survey of Audio Enhancement Algorithms for Music, Speech, Bioacoustics, Biomedical, Industrial, and Environmental Sounds by Image U-Net
title_full	A Survey of Audio Enhancement Algorithms for Music, Speech, Bioacoustics, Biomedical, Industrial, and Environmental Sounds by Image U-Net
title_fullStr	A Survey of Audio Enhancement Algorithms for Music, Speech, Bioacoustics, Biomedical, Industrial, and Environmental Sounds by Image U-Net
title_full_unstemmed	A Survey of Audio Enhancement Algorithms for Music, Speech, Bioacoustics, Biomedical, Industrial, and Environmental Sounds by Image U-Net
title_short	A Survey of Audio Enhancement Algorithms for Music, Speech, Bioacoustics, Biomedical, Industrial, and Environmental Sounds by Image U-Net
title_sort	survey of audio enhancement algorithms for music speech bioacoustics biomedical industrial and environmental sounds by image u net
topic	CNNs image processing deep neural networks pre-trained networks spectrogram U-Net
url	https://ieeexplore.ieee.org/document/10371226/
work_keys_str_mv	AT saniagul asurveyofaudioenhancementalgorithmsformusicspeechbioacousticsbiomedicalindustrialandenvironmentalsoundsbyimageunet AT muhammadsalmankhan asurveyofaudioenhancementalgorithmsformusicspeechbioacousticsbiomedicalindustrialandenvironmentalsoundsbyimageunet AT saniagul surveyofaudioenhancementalgorithmsformusicspeechbioacousticsbiomedicalindustrialandenvironmentalsoundsbyimageunet AT muhammadsalmankhan surveyofaudioenhancementalgorithmsformusicspeechbioacousticsbiomedicalindustrialandenvironmentalsoundsbyimageunet

A Survey of Audio Enhancement Algorithms for Music, Speech, Bioacoustics, Biomedical, Industrial, and Environmental Sounds by Image U-Net

Similar Items