Enhancement of Coded Speech Using Neural Network-Based Side Information

Audio codecs generate notable artifacts when operating at low bitrates, which degrade the quality of the coded audio significantly. There have been several approaches to enhance the quality of decoded signals with and without side information. While pre- or post-processing approaches without side in...

Full description

Bibliographic Details
Main Authors: Soojoong Hwang, Youngju Cheon, Sangwook Han, Inseon Jang, Jong Won Shin
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9524924/
_version_ 1818602310453952512
author Soojoong Hwang
Youngju Cheon
Sangwook Han
Inseon Jang
Jong Won Shin
author_facet Soojoong Hwang
Youngju Cheon
Sangwook Han
Inseon Jang
Jong Won Shin
author_sort Soojoong Hwang
collection DOAJ
description Audio codecs generate notable artifacts when operating at low bitrates, which degrade the quality of the coded audio significantly. There have been several approaches to enhance the quality of decoded signals with and without side information. While pre- or post-processing approaches without side information can be applied directly to existing systems without modifying codecs, approaches utilizing side information can further enhance the performance while maintaining backward-compatibility with existing codecs. In this paper, we propose a method to improve decoded signals using neural network-based side information. A neural network in the transmitter side that generates the side information and another neural network in the receiver side that estimates the log power spectra (LPS) of the original signal from the decoded signal and the side information are jointly trained to accurately reconstruct the original signal. In the same line with the analysis-by-synthesis, the neural network that generates the side information in the transmitter side takes not only the LPS of the original signal but also the LPS of the decoded signal as the input by decoding the encoded bitstream at the transmitter side. Experimental results show that the proposed audio codec enhancement scheme using neural network-based side information outperformed the audio codec enhancement without side information for the same codec operating at higher bitrates.
first_indexed 2024-12-16T13:05:15Z
format Article
id doaj.art-0cf9d93b5de146b6b03eadc9407b2178
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-16T13:05:15Z
publishDate 2021-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-0cf9d93b5de146b6b03eadc9407b21782022-12-21T22:30:45ZengIEEEIEEE Access2169-35362021-01-01912153212154010.1109/ACCESS.2021.31087849524924Enhancement of Coded Speech Using Neural Network-Based Side InformationSoojoong Hwang0Youngju Cheon1Sangwook Han2Inseon Jang3https://orcid.org/0000-0003-2237-2668Jong Won Shin4https://orcid.org/0000-0002-8910-0264School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, Buk-gu, South KoreaSchool of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, Buk-gu, South KoreaSchool of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, Buk-gu, South KoreaElectronics and Telecommunications Research Institute, Daejeon, Yuseong-gu, South KoreaSchool of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, Buk-gu, South KoreaAudio codecs generate notable artifacts when operating at low bitrates, which degrade the quality of the coded audio significantly. There have been several approaches to enhance the quality of decoded signals with and without side information. While pre- or post-processing approaches without side information can be applied directly to existing systems without modifying codecs, approaches utilizing side information can further enhance the performance while maintaining backward-compatibility with existing codecs. In this paper, we propose a method to improve decoded signals using neural network-based side information. A neural network in the transmitter side that generates the side information and another neural network in the receiver side that estimates the log power spectra (LPS) of the original signal from the decoded signal and the side information are jointly trained to accurately reconstruct the original signal. In the same line with the analysis-by-synthesis, the neural network that generates the side information in the transmitter side takes not only the LPS of the original signal but also the LPS of the decoded signal as the input by decoding the encoded bitstream at the transmitter side. Experimental results show that the proposed audio codec enhancement scheme using neural network-based side information outperformed the audio codec enhancement without side information for the same codec operating at higher bitrates.https://ieeexplore.ieee.org/document/9524924/Audio codecspeech codecside informationdeep neural networkdecoded signal enhancement
spellingShingle Soojoong Hwang
Youngju Cheon
Sangwook Han
Inseon Jang
Jong Won Shin
Enhancement of Coded Speech Using Neural Network-Based Side Information
IEEE Access
Audio codec
speech codec
side information
deep neural network
decoded signal enhancement
title Enhancement of Coded Speech Using Neural Network-Based Side Information
title_full Enhancement of Coded Speech Using Neural Network-Based Side Information
title_fullStr Enhancement of Coded Speech Using Neural Network-Based Side Information
title_full_unstemmed Enhancement of Coded Speech Using Neural Network-Based Side Information
title_short Enhancement of Coded Speech Using Neural Network-Based Side Information
title_sort enhancement of coded speech using neural network based side information
topic Audio codec
speech codec
side information
deep neural network
decoded signal enhancement
url https://ieeexplore.ieee.org/document/9524924/
work_keys_str_mv AT soojoonghwang enhancementofcodedspeechusingneuralnetworkbasedsideinformation
AT youngjucheon enhancementofcodedspeechusingneuralnetworkbasedsideinformation
AT sangwookhan enhancementofcodedspeechusingneuralnetworkbasedsideinformation
AT inseonjang enhancementofcodedspeechusingneuralnetworkbasedsideinformation
AT jongwonshin enhancementofcodedspeechusingneuralnetworkbasedsideinformation