Speech fusion to face : bridging the gap between human's vocal characteristics and facial imaging

While deep learning technologies are now capable of generating realistic images confusing humans, the research efforts are turning to the synthesis of images for more concrete and application-specific purposes. Facial image generation based on vocal characteristics from speech is one of such importa...

Full description

Bibliographic Details
Main Author:	Bai, Yeqi
Other Authors:	Wang Lipo
Format:	Final Year Project (FYP)
Language:	English
Published:	Nanyang Technological University 2020
Subjects:	Engineering::Electrical and electronic engineering
Online Access:	https://hdl.handle.net/10356/139255

_version_	1826119544648761344
author	Bai, Yeqi
author2	Wang Lipo
author_facet	Wang Lipo Bai, Yeqi
author_sort	Bai, Yeqi
collection	NTU
description	While deep learning technologies are now capable of generating realistic images confusing humans, the research efforts are turning to the synthesis of images for more concrete and application-specific purposes. Facial image generation based on vocal characteristics from speech is one of such important yet challenging tasks. It is the key enabler to influential use cases of image generation, especially for business in public security and entertainment. Existing solutions to the problem of speech2face renders limited image quality and fails to preserve facial similarity due to the lack of quality dataset for training and appropriate integration of vocal features. In this paper, we investigate these key technical challenges and propose Speech Fusion to Face, or SF2F in short, attempting to address the issue of facial image quality and the poor connection between vocal feature domain and modern image generation models. By adopting new strategies and approaches, we demonstrate dramatic performance boost over the state-of-the-art solution, by doubling the recall of individual identity, and lifting the quality score from 15 to 19 based on the mutual information score with VGGFace classifier.
first_indexed	2024-10-01T05:01:46Z
format	Final Year Project (FYP)
id	ntu-10356/139255
institution	Nanyang Technological University
language	English
last_indexed	2024-10-01T05:01:46Z
publishDate	2020
publisher	Nanyang Technological University
record_format	dspace
spelling	ntu-10356/1392552023-07-07T18:53:34Z Speech fusion to face : bridging the gap between human's vocal characteristics and facial imaging Bai, Yeqi Wang Lipo School of Electrical and Electronic Engineering Yitu Technology Zhang Zhenjie elpwang@ntu.edu.sg Engineering::Electrical and electronic engineering While deep learning technologies are now capable of generating realistic images confusing humans, the research efforts are turning to the synthesis of images for more concrete and application-specific purposes. Facial image generation based on vocal characteristics from speech is one of such important yet challenging tasks. It is the key enabler to influential use cases of image generation, especially for business in public security and entertainment. Existing solutions to the problem of speech2face renders limited image quality and fails to preserve facial similarity due to the lack of quality dataset for training and appropriate integration of vocal features. In this paper, we investigate these key technical challenges and propose Speech Fusion to Face, or SF2F in short, attempting to address the issue of facial image quality and the poor connection between vocal feature domain and modern image generation models. By adopting new strategies and approaches, we demonstrate dramatic performance boost over the state-of-the-art solution, by doubling the recall of individual identity, and lifting the quality score from 15 to 19 based on the mutual information score with VGGFace classifier. Bachelor of Engineering (Electrical and Electronic Engineering) 2020-05-18T07:06:27Z 2020-05-18T07:06:27Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/139255 en A3271-191 application/pdf Nanyang Technological University
spellingShingle	Engineering::Electrical and electronic engineering Bai, Yeqi Speech fusion to face : bridging the gap between human's vocal characteristics and facial imaging
title	Speech fusion to face : bridging the gap between human's vocal characteristics and facial imaging
title_full	Speech fusion to face : bridging the gap between human's vocal characteristics and facial imaging
title_fullStr	Speech fusion to face : bridging the gap between human's vocal characteristics and facial imaging
title_full_unstemmed	Speech fusion to face : bridging the gap between human's vocal characteristics and facial imaging
title_short	Speech fusion to face : bridging the gap between human's vocal characteristics and facial imaging
title_sort	speech fusion to face bridging the gap between human s vocal characteristics and facial imaging
topic	Engineering::Electrical and electronic engineering
url	https://hdl.handle.net/10356/139255
work_keys_str_mv	AT baiyeqi speechfusiontofacebridgingthegapbetweenhumansvocalcharacteristicsandfacialimaging

Speech fusion to face : bridging the gap between human's vocal characteristics and facial imaging

Similar Items