PeriodNet: A Non-Autoregressive Raw Waveform Generative Model With a Structure Separating Periodic and Aperiodic Components

This paper presents PeriodNet, a non-autoregressive (non-AR) waveform generative model with a new model structure for modeling periodic and aperiodic components in speech waveforms. Non-AR raw waveform generative models have enabled the fast generation of high-quality waveforms. However, the variati...

Full description

Bibliographic Details
Main Authors:	Yukiya Hono, Shinji Takaki, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda
Format:	Article
Language:	English
Published:	IEEE 2021-01-01
Series:	IEEE Access
Subjects:	Generative adversarial network neural vocoder signal processing singing voice synthesis waveform generative model
Online Access:	https://ieeexplore.ieee.org/document/9559963/

_version_	1831761393749065728
author	Yukiya Hono Shinji Takaki Kei Hashimoto Keiichiro Oura Yoshihiko Nankaku Keiichi Tokuda
author_facet	Yukiya Hono Shinji Takaki Kei Hashimoto Keiichiro Oura Yoshihiko Nankaku Keiichi Tokuda
author_sort	Yukiya Hono
collection	DOAJ
description	This paper presents PeriodNet, a non-autoregressive (non-AR) waveform generative model with a new model structure for modeling periodic and aperiodic components in speech waveforms. Non-AR raw waveform generative models have enabled the fast generation of high-quality waveforms. However, the variations of waveforms that these models can reconstruct are limited by training data. In addition, typical non-AR models reconstruct a speech waveform from a single Gaussian input despite the mixture of periodic and aperiodic signals in speech. These may significantly affect the waveform generation process in some applications such as singing voice synthesis systems, which require reproducing accurate pitch and natural sounds with less periodicity, including husky and breath sounds. PeriodNet uses a parallel or series model structure to model a speech waveform to tackle these problems. Two sub-generators connected in parallel or in series take an explicit periodic and aperiodic signal (sine wave and Gaussian noise) as an input. Since PeriodNet models periodic and aperiodic components by focusing on whether these input signals are autocorrelated or not, it does not require external periodic/aperiodic decomposition during training. Experimental results show that our proposed structure improves the naturalness of generated waveforms. We also show that speech waveforms with a pitch outside of the training data range can be generated with more naturalness.
first_indexed	2024-12-22T04:48:07Z
format	Article
id	doaj.art-679dc822cf05493fbbd5e72bb4af426f
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-22T04:48:07Z
publishDate	2021-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-679dc822cf05493fbbd5e72bb4af426f2022-12-21T18:38:34ZengIEEEIEEE Access2169-35362021-01-01913759913761210.1109/ACCESS.2021.31180339559963PeriodNet: A Non-Autoregressive Raw Waveform Generative Model With a Structure Separating Periodic and Aperiodic ComponentsYukiya Hono0https://orcid.org/0000-0003-1245-8791Shinji Takaki1https://orcid.org/0000-0001-7294-7699Kei Hashimoto2https://orcid.org/0000-0003-2081-0396Keiichiro Oura3Yoshihiko Nankaku4Keiichi Tokuda5https://orcid.org/0000-0001-6143-0133Department of Computer Science, Nagoya Institute of Technology, Nagoya, JapanDepartment of Computer Science, Nagoya Institute of Technology, Nagoya, JapanDepartment of Computer Science, Nagoya Institute of Technology, Nagoya, JapanDepartment of Computer Science, Nagoya Institute of Technology, Nagoya, JapanDepartment of Computer Science, Nagoya Institute of Technology, Nagoya, JapanDepartment of Computer Science, Nagoya Institute of Technology, Nagoya, JapanThis paper presents PeriodNet, a non-autoregressive (non-AR) waveform generative model with a new model structure for modeling periodic and aperiodic components in speech waveforms. Non-AR raw waveform generative models have enabled the fast generation of high-quality waveforms. However, the variations of waveforms that these models can reconstruct are limited by training data. In addition, typical non-AR models reconstruct a speech waveform from a single Gaussian input despite the mixture of periodic and aperiodic signals in speech. These may significantly affect the waveform generation process in some applications such as singing voice synthesis systems, which require reproducing accurate pitch and natural sounds with less periodicity, including husky and breath sounds. PeriodNet uses a parallel or series model structure to model a speech waveform to tackle these problems. Two sub-generators connected in parallel or in series take an explicit periodic and aperiodic signal (sine wave and Gaussian noise) as an input. Since PeriodNet models periodic and aperiodic components by focusing on whether these input signals are autocorrelated or not, it does not require external periodic/aperiodic decomposition during training. Experimental results show that our proposed structure improves the naturalness of generated waveforms. We also show that speech waveforms with a pitch outside of the training data range can be generated with more naturalness.https://ieeexplore.ieee.org/document/9559963/Generative adversarial networkneural vocodersignal processingsinging voice synthesiswaveform generative model
spellingShingle	Yukiya Hono Shinji Takaki Kei Hashimoto Keiichiro Oura Yoshihiko Nankaku Keiichi Tokuda PeriodNet: A Non-Autoregressive Raw Waveform Generative Model With a Structure Separating Periodic and Aperiodic Components IEEE Access Generative adversarial network neural vocoder signal processing singing voice synthesis waveform generative model
title	PeriodNet: A Non-Autoregressive Raw Waveform Generative Model With a Structure Separating Periodic and Aperiodic Components
title_full	PeriodNet: A Non-Autoregressive Raw Waveform Generative Model With a Structure Separating Periodic and Aperiodic Components
title_fullStr	PeriodNet: A Non-Autoregressive Raw Waveform Generative Model With a Structure Separating Periodic and Aperiodic Components
title_full_unstemmed	PeriodNet: A Non-Autoregressive Raw Waveform Generative Model With a Structure Separating Periodic and Aperiodic Components
title_short	PeriodNet: A Non-Autoregressive Raw Waveform Generative Model With a Structure Separating Periodic and Aperiodic Components
title_sort	periodnet a non autoregressive raw waveform generative model with a structure separating periodic and aperiodic components
topic	Generative adversarial network neural vocoder signal processing singing voice synthesis waveform generative model
url	https://ieeexplore.ieee.org/document/9559963/
work_keys_str_mv	AT yukiyahono periodnetanonautoregressiverawwaveformgenerativemodelwithastructureseparatingperiodicandaperiodiccomponents AT shinjitakaki periodnetanonautoregressiverawwaveformgenerativemodelwithastructureseparatingperiodicandaperiodiccomponents AT keihashimoto periodnetanonautoregressiverawwaveformgenerativemodelwithastructureseparatingperiodicandaperiodiccomponents AT keiichirooura periodnetanonautoregressiverawwaveformgenerativemodelwithastructureseparatingperiodicandaperiodiccomponents AT yoshihikonankaku periodnetanonautoregressiverawwaveformgenerativemodelwithastructureseparatingperiodicandaperiodiccomponents AT keiichitokuda periodnetanonautoregressiverawwaveformgenerativemodelwithastructureseparatingperiodicandaperiodiccomponents

PeriodNet: A Non-Autoregressive Raw Waveform Generative Model With a Structure Separating Periodic and Aperiodic Components

Similar Items