Full-Band LPCNet: A Real-Time Neural Vocoder for 48 kHz Audio With a CPU

This paper investigates a real-time neural speech synthesis system on CPUs that can synthesize high-fidelity 48 kHz speech waveforms to cover the entire frequency range audible by human beings. Although most previous studies on 48 kHz speech synthesis have used traditional source-filter vocoders or...

Full description

Bibliographic Details
Main Authors: Keisuke Matsubara, Takuma Okamoto, Ryoichi Takashima, Tetsuya Takiguchi, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9455356/
_version_ 1818922422153248768
author Keisuke Matsubara
Takuma Okamoto
Ryoichi Takashima
Tetsuya Takiguchi
Tomoki Toda
Yoshinori Shiga
Hisashi Kawai
author_facet Keisuke Matsubara
Takuma Okamoto
Ryoichi Takashima
Tetsuya Takiguchi
Tomoki Toda
Yoshinori Shiga
Hisashi Kawai
author_sort Keisuke Matsubara
collection DOAJ
description This paper investigates a real-time neural speech synthesis system on CPUs that can synthesize high-fidelity 48 kHz speech waveforms to cover the entire frequency range audible by human beings. Although most previous studies on 48 kHz speech synthesis have used traditional source-filter vocoders or a WaveNet vocoder for waveform generation, they have some drawbacks regarding synthesis quality or inference speed. LPCNet was proposed as a real-time neural vocoder with a mobile CPU but its sampling frequency is still only 16 kHz. In this paper, we propose a Full-band LPCNet to synthesize high-fidelity 48 kHz speech waveforms with a CPU by introducing some simple but effective modifications to the conventional LPCNet. We then evaluate the synthesis quality using both normal speech and a singing voice. The results of these experiments demonstrate that the proposed Full-band LPCNet is the only neural vocoder that can synthesize high-quality 48 kHz speech waveforms while maintaining real-time capability with a CPU.
first_indexed 2024-12-20T01:53:17Z
format Article
id doaj.art-2b1161e52b18489e9d7ce9bb4a6bde4d
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-20T01:53:17Z
publishDate 2021-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-2b1161e52b18489e9d7ce9bb4a6bde4d2022-12-21T19:57:35ZengIEEEIEEE Access2169-35362021-01-019949239493310.1109/ACCESS.2021.30895659455356Full-Band LPCNet: A Real-Time Neural Vocoder for 48 kHz Audio With a CPUKeisuke Matsubara0https://orcid.org/0000-0002-2935-668XTakuma Okamoto1https://orcid.org/0000-0001-9913-4647Ryoichi Takashima2https://orcid.org/0000-0002-9808-0250Tetsuya Takiguchi3https://orcid.org/0000-0001-5005-7679Tomoki Toda4https://orcid.org/0000-0001-8146-1279Yoshinori Shiga5Hisashi Kawai6Graduate School of System Informatics, Kobe University, Kobe, JapanNational Institute of Information and Communications Technology, Kyoto, JapanGraduate School of System Informatics, Kobe University, Kobe, JapanGraduate School of System Informatics, Kobe University, Kobe, JapanNational Institute of Information and Communications Technology, Kyoto, JapanNational Institute of Information and Communications Technology, Kyoto, JapanNational Institute of Information and Communications Technology, Kyoto, JapanThis paper investigates a real-time neural speech synthesis system on CPUs that can synthesize high-fidelity 48 kHz speech waveforms to cover the entire frequency range audible by human beings. Although most previous studies on 48 kHz speech synthesis have used traditional source-filter vocoders or a WaveNet vocoder for waveform generation, they have some drawbacks regarding synthesis quality or inference speed. LPCNet was proposed as a real-time neural vocoder with a mobile CPU but its sampling frequency is still only 16 kHz. In this paper, we propose a Full-band LPCNet to synthesize high-fidelity 48 kHz speech waveforms with a CPU by introducing some simple but effective modifications to the conventional LPCNet. We then evaluate the synthesis quality using both normal speech and a singing voice. The results of these experiments demonstrate that the proposed Full-band LPCNet is the only neural vocoder that can synthesize high-quality 48 kHz speech waveforms while maintaining real-time capability with a CPU.https://ieeexplore.ieee.org/document/9455356/Speech synthesisneural vocoderLPCNettext-to-speechsinging voice synthesis
spellingShingle Keisuke Matsubara
Takuma Okamoto
Ryoichi Takashima
Tetsuya Takiguchi
Tomoki Toda
Yoshinori Shiga
Hisashi Kawai
Full-Band LPCNet: A Real-Time Neural Vocoder for 48 kHz Audio With a CPU
IEEE Access
Speech synthesis
neural vocoder
LPCNet
text-to-speech
singing voice synthesis
title Full-Band LPCNet: A Real-Time Neural Vocoder for 48 kHz Audio With a CPU
title_full Full-Band LPCNet: A Real-Time Neural Vocoder for 48 kHz Audio With a CPU
title_fullStr Full-Band LPCNet: A Real-Time Neural Vocoder for 48 kHz Audio With a CPU
title_full_unstemmed Full-Band LPCNet: A Real-Time Neural Vocoder for 48 kHz Audio With a CPU
title_short Full-Band LPCNet: A Real-Time Neural Vocoder for 48 kHz Audio With a CPU
title_sort full band lpcnet a real time neural vocoder for 48 khz audio with a cpu
topic Speech synthesis
neural vocoder
LPCNet
text-to-speech
singing voice synthesis
url https://ieeexplore.ieee.org/document/9455356/
work_keys_str_mv AT keisukematsubara fullbandlpcnetarealtimeneuralvocoderfor48khzaudiowithacpu
AT takumaokamoto fullbandlpcnetarealtimeneuralvocoderfor48khzaudiowithacpu
AT ryoichitakashima fullbandlpcnetarealtimeneuralvocoderfor48khzaudiowithacpu
AT tetsuyatakiguchi fullbandlpcnetarealtimeneuralvocoderfor48khzaudiowithacpu
AT tomokitoda fullbandlpcnetarealtimeneuralvocoderfor48khzaudiowithacpu
AT yoshinorishiga fullbandlpcnetarealtimeneuralvocoderfor48khzaudiowithacpu
AT hisashikawai fullbandlpcnetarealtimeneuralvocoderfor48khzaudiowithacpu