Full-Band LPCNet: A Real-Time Neural Vocoder for 48 kHz Audio With a CPU
This paper investigates a real-time neural speech synthesis system on CPUs that can synthesize high-fidelity 48 kHz speech waveforms to cover the entire frequency range audible by human beings. Although most previous studies on 48 kHz speech synthesis have used traditional source-filter vocoders or...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2021-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9455356/ |
_version_ | 1818922422153248768 |
---|---|
author | Keisuke Matsubara Takuma Okamoto Ryoichi Takashima Tetsuya Takiguchi Tomoki Toda Yoshinori Shiga Hisashi Kawai |
author_facet | Keisuke Matsubara Takuma Okamoto Ryoichi Takashima Tetsuya Takiguchi Tomoki Toda Yoshinori Shiga Hisashi Kawai |
author_sort | Keisuke Matsubara |
collection | DOAJ |
description | This paper investigates a real-time neural speech synthesis system on CPUs that can synthesize high-fidelity 48 kHz speech waveforms to cover the entire frequency range audible by human beings. Although most previous studies on 48 kHz speech synthesis have used traditional source-filter vocoders or a WaveNet vocoder for waveform generation, they have some drawbacks regarding synthesis quality or inference speed. LPCNet was proposed as a real-time neural vocoder with a mobile CPU but its sampling frequency is still only 16 kHz. In this paper, we propose a Full-band LPCNet to synthesize high-fidelity 48 kHz speech waveforms with a CPU by introducing some simple but effective modifications to the conventional LPCNet. We then evaluate the synthesis quality using both normal speech and a singing voice. The results of these experiments demonstrate that the proposed Full-band LPCNet is the only neural vocoder that can synthesize high-quality 48 kHz speech waveforms while maintaining real-time capability with a CPU. |
first_indexed | 2024-12-20T01:53:17Z |
format | Article |
id | doaj.art-2b1161e52b18489e9d7ce9bb4a6bde4d |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-20T01:53:17Z |
publishDate | 2021-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-2b1161e52b18489e9d7ce9bb4a6bde4d2022-12-21T19:57:35ZengIEEEIEEE Access2169-35362021-01-019949239493310.1109/ACCESS.2021.30895659455356Full-Band LPCNet: A Real-Time Neural Vocoder for 48 kHz Audio With a CPUKeisuke Matsubara0https://orcid.org/0000-0002-2935-668XTakuma Okamoto1https://orcid.org/0000-0001-9913-4647Ryoichi Takashima2https://orcid.org/0000-0002-9808-0250Tetsuya Takiguchi3https://orcid.org/0000-0001-5005-7679Tomoki Toda4https://orcid.org/0000-0001-8146-1279Yoshinori Shiga5Hisashi Kawai6Graduate School of System Informatics, Kobe University, Kobe, JapanNational Institute of Information and Communications Technology, Kyoto, JapanGraduate School of System Informatics, Kobe University, Kobe, JapanGraduate School of System Informatics, Kobe University, Kobe, JapanNational Institute of Information and Communications Technology, Kyoto, JapanNational Institute of Information and Communications Technology, Kyoto, JapanNational Institute of Information and Communications Technology, Kyoto, JapanThis paper investigates a real-time neural speech synthesis system on CPUs that can synthesize high-fidelity 48 kHz speech waveforms to cover the entire frequency range audible by human beings. Although most previous studies on 48 kHz speech synthesis have used traditional source-filter vocoders or a WaveNet vocoder for waveform generation, they have some drawbacks regarding synthesis quality or inference speed. LPCNet was proposed as a real-time neural vocoder with a mobile CPU but its sampling frequency is still only 16 kHz. In this paper, we propose a Full-band LPCNet to synthesize high-fidelity 48 kHz speech waveforms with a CPU by introducing some simple but effective modifications to the conventional LPCNet. We then evaluate the synthesis quality using both normal speech and a singing voice. The results of these experiments demonstrate that the proposed Full-band LPCNet is the only neural vocoder that can synthesize high-quality 48 kHz speech waveforms while maintaining real-time capability with a CPU.https://ieeexplore.ieee.org/document/9455356/Speech synthesisneural vocoderLPCNettext-to-speechsinging voice synthesis |
spellingShingle | Keisuke Matsubara Takuma Okamoto Ryoichi Takashima Tetsuya Takiguchi Tomoki Toda Yoshinori Shiga Hisashi Kawai Full-Band LPCNet: A Real-Time Neural Vocoder for 48 kHz Audio With a CPU IEEE Access Speech synthesis neural vocoder LPCNet text-to-speech singing voice synthesis |
title | Full-Band LPCNet: A Real-Time Neural Vocoder for 48 kHz Audio With a CPU |
title_full | Full-Band LPCNet: A Real-Time Neural Vocoder for 48 kHz Audio With a CPU |
title_fullStr | Full-Band LPCNet: A Real-Time Neural Vocoder for 48 kHz Audio With a CPU |
title_full_unstemmed | Full-Band LPCNet: A Real-Time Neural Vocoder for 48 kHz Audio With a CPU |
title_short | Full-Band LPCNet: A Real-Time Neural Vocoder for 48 kHz Audio With a CPU |
title_sort | full band lpcnet a real time neural vocoder for 48 khz audio with a cpu |
topic | Speech synthesis neural vocoder LPCNet text-to-speech singing voice synthesis |
url | https://ieeexplore.ieee.org/document/9455356/ |
work_keys_str_mv | AT keisukematsubara fullbandlpcnetarealtimeneuralvocoderfor48khzaudiowithacpu AT takumaokamoto fullbandlpcnetarealtimeneuralvocoderfor48khzaudiowithacpu AT ryoichitakashima fullbandlpcnetarealtimeneuralvocoderfor48khzaudiowithacpu AT tetsuyatakiguchi fullbandlpcnetarealtimeneuralvocoderfor48khzaudiowithacpu AT tomokitoda fullbandlpcnetarealtimeneuralvocoderfor48khzaudiowithacpu AT yoshinorishiga fullbandlpcnetarealtimeneuralvocoderfor48khzaudiowithacpu AT hisashikawai fullbandlpcnetarealtimeneuralvocoderfor48khzaudiowithacpu |