Enhancement of Coded Speech Using Neural Network-Based Side Information
Audio codecs generate notable artifacts when operating at low bitrates, which degrade the quality of the coded audio significantly. There have been several approaches to enhance the quality of decoded signals with and without side information. While pre- or post-processing approaches without side in...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2021-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9524924/ |
_version_ | 1818602310453952512 |
---|---|
author | Soojoong Hwang Youngju Cheon Sangwook Han Inseon Jang Jong Won Shin |
author_facet | Soojoong Hwang Youngju Cheon Sangwook Han Inseon Jang Jong Won Shin |
author_sort | Soojoong Hwang |
collection | DOAJ |
description | Audio codecs generate notable artifacts when operating at low bitrates, which degrade the quality of the coded audio significantly. There have been several approaches to enhance the quality of decoded signals with and without side information. While pre- or post-processing approaches without side information can be applied directly to existing systems without modifying codecs, approaches utilizing side information can further enhance the performance while maintaining backward-compatibility with existing codecs. In this paper, we propose a method to improve decoded signals using neural network-based side information. A neural network in the transmitter side that generates the side information and another neural network in the receiver side that estimates the log power spectra (LPS) of the original signal from the decoded signal and the side information are jointly trained to accurately reconstruct the original signal. In the same line with the analysis-by-synthesis, the neural network that generates the side information in the transmitter side takes not only the LPS of the original signal but also the LPS of the decoded signal as the input by decoding the encoded bitstream at the transmitter side. Experimental results show that the proposed audio codec enhancement scheme using neural network-based side information outperformed the audio codec enhancement without side information for the same codec operating at higher bitrates. |
first_indexed | 2024-12-16T13:05:15Z |
format | Article |
id | doaj.art-0cf9d93b5de146b6b03eadc9407b2178 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-16T13:05:15Z |
publishDate | 2021-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-0cf9d93b5de146b6b03eadc9407b21782022-12-21T22:30:45ZengIEEEIEEE Access2169-35362021-01-01912153212154010.1109/ACCESS.2021.31087849524924Enhancement of Coded Speech Using Neural Network-Based Side InformationSoojoong Hwang0Youngju Cheon1Sangwook Han2Inseon Jang3https://orcid.org/0000-0003-2237-2668Jong Won Shin4https://orcid.org/0000-0002-8910-0264School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, Buk-gu, South KoreaSchool of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, Buk-gu, South KoreaSchool of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, Buk-gu, South KoreaElectronics and Telecommunications Research Institute, Daejeon, Yuseong-gu, South KoreaSchool of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, Buk-gu, South KoreaAudio codecs generate notable artifacts when operating at low bitrates, which degrade the quality of the coded audio significantly. There have been several approaches to enhance the quality of decoded signals with and without side information. While pre- or post-processing approaches without side information can be applied directly to existing systems without modifying codecs, approaches utilizing side information can further enhance the performance while maintaining backward-compatibility with existing codecs. In this paper, we propose a method to improve decoded signals using neural network-based side information. A neural network in the transmitter side that generates the side information and another neural network in the receiver side that estimates the log power spectra (LPS) of the original signal from the decoded signal and the side information are jointly trained to accurately reconstruct the original signal. In the same line with the analysis-by-synthesis, the neural network that generates the side information in the transmitter side takes not only the LPS of the original signal but also the LPS of the decoded signal as the input by decoding the encoded bitstream at the transmitter side. Experimental results show that the proposed audio codec enhancement scheme using neural network-based side information outperformed the audio codec enhancement without side information for the same codec operating at higher bitrates.https://ieeexplore.ieee.org/document/9524924/Audio codecspeech codecside informationdeep neural networkdecoded signal enhancement |
spellingShingle | Soojoong Hwang Youngju Cheon Sangwook Han Inseon Jang Jong Won Shin Enhancement of Coded Speech Using Neural Network-Based Side Information IEEE Access Audio codec speech codec side information deep neural network decoded signal enhancement |
title | Enhancement of Coded Speech Using Neural Network-Based Side Information |
title_full | Enhancement of Coded Speech Using Neural Network-Based Side Information |
title_fullStr | Enhancement of Coded Speech Using Neural Network-Based Side Information |
title_full_unstemmed | Enhancement of Coded Speech Using Neural Network-Based Side Information |
title_short | Enhancement of Coded Speech Using Neural Network-Based Side Information |
title_sort | enhancement of coded speech using neural network based side information |
topic | Audio codec speech codec side information deep neural network decoded signal enhancement |
url | https://ieeexplore.ieee.org/document/9524924/ |
work_keys_str_mv | AT soojoonghwang enhancementofcodedspeechusingneuralnetworkbasedsideinformation AT youngjucheon enhancementofcodedspeechusingneuralnetworkbasedsideinformation AT sangwookhan enhancementofcodedspeechusingneuralnetworkbasedsideinformation AT inseonjang enhancementofcodedspeechusingneuralnetworkbasedsideinformation AT jongwonshin enhancementofcodedspeechusingneuralnetworkbasedsideinformation |