ReliTalk: relightable talking portrait generation from a single video

Recent years have witnessed great progress in creating vivid audio-driven portraits from monocular videos. However, how to seamlessly adapt the created video avatars to other scenarios with different backgrounds and lighting conditions remains unsolved. On the other hand, existing relighting studies...

Full description

Bibliographic Details
Main Authors:	Qiu, Haonan, Chen, Zhaoxi, Jiang, Yuming, Zhou, Hang, Fan, Xiangyu, Yang, Lei, Wu, Wayne, Liu, Ziwei
Other Authors:	College of Computing and Data Science
Format:	Journal Article
Language:	English
Published:	2024
Subjects:	Computer and Information Science Relighting Talking face
Online Access:	https://hdl.handle.net/10356/178290

_version_	1826115123847102464
author	Qiu, Haonan Chen, Zhaoxi Jiang, Yuming Zhou, Hang Fan, Xiangyu Yang, Lei Wu, Wayne Liu, Ziwei
author2	College of Computing and Data Science
author_facet	College of Computing and Data Science Qiu, Haonan Chen, Zhaoxi Jiang, Yuming Zhou, Hang Fan, Xiangyu Yang, Lei Wu, Wayne Liu, Ziwei
author_sort	Qiu, Haonan
collection	NTU
description	Recent years have witnessed great progress in creating vivid audio-driven portraits from monocular videos. However, how to seamlessly adapt the created video avatars to other scenarios with different backgrounds and lighting conditions remains unsolved. On the other hand, existing relighting studies mostly rely on dynamically lighted or multi-view data, which are too expensive for creating video portraits. To bridge this gap, we propose ReliTalk, a novel framework for relightable audio-driven talking portrait generation from monocular videos. Our key insight is to decompose the portrait’s reflectance from implicitly learned audio-driven facial normals and images. Specifically, we involve 3D facial priors derived from audio features to predict delicate normal maps through implicit functions. These initially predicted normals then take a crucial part in reflectance decomposition by dynamically estimating the lighting condition of the given video. Moreover, the stereoscopic face representation is refined using the identity-consistent loss under simulated multiple lighting conditions, addressing the ill-posed problem caused by limited views available from a single monocular video. Extensive experiments validate the superiority of our proposed framework on both real and synthetic datasets. Our code is released in (https://github.com/arthur-qiu/ReliTalk).
first_indexed	2024-10-01T03:50:05Z
format	Journal Article
id	ntu-10356/178290
institution	Nanyang Technological University
language	English
last_indexed	2024-10-01T03:50:05Z
publishDate	2024
record_format	dspace
spelling	ntu-10356/1782902024-06-11T01:03:29Z ReliTalk: relightable talking portrait generation from a single video Qiu, Haonan Chen, Zhaoxi Jiang, Yuming Zhou, Hang Fan, Xiangyu Yang, Lei Wu, Wayne Liu, Ziwei College of Computing and Data Science S-Lab Computer and Information Science Relighting Talking face Recent years have witnessed great progress in creating vivid audio-driven portraits from monocular videos. However, how to seamlessly adapt the created video avatars to other scenarios with different backgrounds and lighting conditions remains unsolved. On the other hand, existing relighting studies mostly rely on dynamically lighted or multi-view data, which are too expensive for creating video portraits. To bridge this gap, we propose ReliTalk, a novel framework for relightable audio-driven talking portrait generation from monocular videos. Our key insight is to decompose the portrait’s reflectance from implicitly learned audio-driven facial normals and images. Specifically, we involve 3D facial priors derived from audio features to predict delicate normal maps through implicit functions. These initially predicted normals then take a crucial part in reflectance decomposition by dynamically estimating the lighting condition of the given video. Moreover, the stereoscopic face representation is refined using the identity-consistent loss under simulated multiple lighting conditions, addressing the ill-posed problem caused by limited views available from a single monocular video. Extensive experiments validate the superiority of our proposed framework on both real and synthetic datasets. Our code is released in (https://github.com/arthur-qiu/ReliTalk). Agency for Science, Technology and Research (A*STAR) Ministry of Education (MOE) Nanyang Technological University National Research Foundation (NRF) This research is supported by the National Research Foundation, Singapore under its AI Singapore Programme (AISG Award No: AISG2-PhD-2022-01-035T), NTU NAP, MOE AcRF Tier 2 (MOET2EP20221-0012), and under the RIE2020 Industry Alignment Fund - Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s). 2024-06-11T01:03:28Z 2024-06-11T01:03:28Z 2024 Journal Article Qiu, H., Chen, Z., Jiang, Y., Zhou, H., Fan, X., Yang, L., Wu, W. & Liu, Z. (2024). ReliTalk: relightable talking portrait generation from a single video. International Journal of Computer Vision. https://dx.doi.org/10.1007/s11263-024-02007-9 0920-5691 https://hdl.handle.net/10356/178290 10.1007/s11263-024-02007-9 2-s2.0-85185108450 en AISG2-PhD-2022-01-035T NTU NAP MOET2EP20221-0012 IAF-ICP International Journal of Computer Vision © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024. All rights reserved.
spellingShingle	Computer and Information Science Relighting Talking face Qiu, Haonan Chen, Zhaoxi Jiang, Yuming Zhou, Hang Fan, Xiangyu Yang, Lei Wu, Wayne Liu, Ziwei ReliTalk: relightable talking portrait generation from a single video
title	ReliTalk: relightable talking portrait generation from a single video
title_full	ReliTalk: relightable talking portrait generation from a single video
title_fullStr	ReliTalk: relightable talking portrait generation from a single video
title_full_unstemmed	ReliTalk: relightable talking portrait generation from a single video
title_short	ReliTalk: relightable talking portrait generation from a single video
title_sort	relitalk relightable talking portrait generation from a single video
topic	Computer and Information Science Relighting Talking face
url	https://hdl.handle.net/10356/178290
work_keys_str_mv	AT qiuhaonan relitalkrelightabletalkingportraitgenerationfromasinglevideo AT chenzhaoxi relitalkrelightabletalkingportraitgenerationfromasinglevideo AT jiangyuming relitalkrelightabletalkingportraitgenerationfromasinglevideo AT zhouhang relitalkrelightabletalkingportraitgenerationfromasinglevideo AT fanxiangyu relitalkrelightabletalkingportraitgenerationfromasinglevideo AT yanglei relitalkrelightabletalkingportraitgenerationfromasinglevideo AT wuwayne relitalkrelightabletalkingportraitgenerationfromasinglevideo AT liuziwei relitalkrelightabletalkingportraitgenerationfromasinglevideo

ReliTalk: relightable talking portrait generation from a single video

Similar Items