Generative models for speech emotion synthesis

Several attempts have been made to synthesize speech from text. However, existing methods tend to generate speech that sound artificial and lack emotional content. In this project, we investigate using Generative Adversarial Networks (GANs) to generate emotional speech. WaveGAN (2019) was a fir...

Full description

Bibliographic Details
Main Author:	Raj, Nathanael S.
Other Authors:	Jagath C. Rajapakse
Format:	Final Year Project (FYP)
Language:	English
Published:	2019
Subjects:	DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Online Access:	http://hdl.handle.net/10356/76865

_version_	1826120208839868416
author	Raj, Nathanael S.
author2	Jagath C. Rajapakse
author_facet	Jagath C. Rajapakse Raj, Nathanael S.
author_sort	Raj, Nathanael S.
collection	NTU
description	Several attempts have been made to synthesize speech from text. However, existing methods tend to generate speech that sound artificial and lack emotional content. In this project, we investigate using Generative Adversarial Networks (GANs) to generate emotional speech. WaveGAN (2019) was a first attempt at generating speech using raw audio waveforms. It produced natural sounding audio, including speech, bird chirpings and drums. In this project, we applied WaveGAN to emotional speech data from The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), using all 8 categories of emotion. We performed modifications on WaveGAN using advanced conditioning strategies, namely Sparse Vector Conditioning and introducing Auxiliary Classifiers. In experiments conducted with human listeners, we found that these methods greatly aided subjects in identifying the generated emotions correctly, and improved ease of intelligibility and quality of generated samples.
first_indexed	2024-10-01T05:12:46Z
format	Final Year Project (FYP)
id	ntu-10356/76865
institution	Nanyang Technological University
language	English
last_indexed	2024-10-01T05:12:46Z
publishDate	2019
record_format	dspace
spelling	ntu-10356/768652023-03-03T20:46:06Z Generative models for speech emotion synthesis Raj, Nathanael S. Jagath C. Rajapakse School of Computer Science and Engineering DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Several attempts have been made to synthesize speech from text. However, existing methods tend to generate speech that sound artificial and lack emotional content. In this project, we investigate using Generative Adversarial Networks (GANs) to generate emotional speech. WaveGAN (2019) was a first attempt at generating speech using raw audio waveforms. It produced natural sounding audio, including speech, bird chirpings and drums. In this project, we applied WaveGAN to emotional speech data from The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), using all 8 categories of emotion. We performed modifications on WaveGAN using advanced conditioning strategies, namely Sparse Vector Conditioning and introducing Auxiliary Classifiers. In experiments conducted with human listeners, we found that these methods greatly aided subjects in identifying the generated emotions correctly, and improved ease of intelligibility and quality of generated samples. Bachelor of Engineering (Computer Science) 2019-04-20T06:12:15Z 2019-04-20T06:12:15Z 2019 Final Year Project (FYP) http://hdl.handle.net/10356/76865 en Nanyang Technological University 56 p. application/pdf
spellingShingle	DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Raj, Nathanael S. Generative models for speech emotion synthesis
title	Generative models for speech emotion synthesis
title_full	Generative models for speech emotion synthesis
title_fullStr	Generative models for speech emotion synthesis
title_full_unstemmed	Generative models for speech emotion synthesis
title_short	Generative models for speech emotion synthesis
title_sort	generative models for speech emotion synthesis
topic	DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
url	http://hdl.handle.net/10356/76865
work_keys_str_mv	AT rajnathanaels generativemodelsforspeechemotionsynthesis

Generative models for speech emotion synthesis

Similar Items