SelfRemaster: Self-Supervised Speech Restoration for Historical Audio Resources

Restoring high-quality speech from degraded historical recordings is crucial for the preservation of cultural and endangered linguistic resources. A key challenge in this task is the scarcity of paired training data that replicate the original acoustic conditions of the historical audio. While previ...

Full description

Bibliographic Details
Main Authors: Takaaki Saeki, Shinnosuke Takamichi, Tomohiko Nakamura, Naoko Tanji, Hiroshi Saruwatari
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10366219/
Description
Summary:Restoring high-quality speech from degraded historical recordings is crucial for the preservation of cultural and endangered linguistic resources. A key challenge in this task is the scarcity of paired training data that replicate the original acoustic conditions of the historical audio. While previous approaches have used pseudo paired data generated by applying various distortions to clean speech corpora, their limitations stem from the inability to authentically simulate the acoustic variations in historical recordings. We propose a self-supervised approach to speech restoration that does not require paired corpora. Our model has three main modules: analysis, synthesis, and channel modules, all of which are designed to emulate the recording process of degraded audio signals. The analysis module disentangles undistorted speech and distortion features, and the synthesis module generates the restored speech waveform. The channel module then introduces distortions into the speech waveform to compute the reconstruction loss between the input and output degraded speech signals. We further improve our model by introducing several methods including dual learning and semi-supervised learning. An additional feature of our model is the audio effect transfer, which allows acoustic distortions from degraded audio signals to be applied to arbitrary audio signals. Experimental evaluations demonstrated that our approach significantly outperforms the previous supervised approach for the restoration of real historical speech resources.
ISSN:2169-3536