Lip reading in profile

There has been a quantum leap in the performance of automated lip reading recently due to the application of neural network sequence models trained on a very large corpus of aligned text and face videos. However, this advance has only been demonstrated for frontal or near frontal faces, and so the q...

Full description

Bibliographic Details
Main Authors: Chung, J, Zisserman, A
Format: Conference item
Published: British Machine Vision Association and Society for Pattern Recognition 2017
_version_ 1826287928233426944
author Chung, J
Zisserman, A
author_facet Chung, J
Zisserman, A
author_sort Chung, J
collection OXFORD
description There has been a quantum leap in the performance of automated lip reading recently due to the application of neural network sequence models trained on a very large corpus of aligned text and face videos. However, this advance has only been demonstrated for frontal or near frontal faces, and so the question remains: can lips be read in profile to the same standard? The objective of this paper is to answer that question. We make three contributions: first, we obtain a new large aligned training corpus that contains profile faces, and select these using a face pose regressor network; second, we propose a curriculum learning procedure that is able to extend SyncNet [10] (a network to synchronize face movements and speech) progressively from frontal to profile faces; third, we demonstrate lip reading in profile for unseen videos. The trained model is evaluated on a held out test set, and is also shown to far surpass the state of the art on the OuluVS2 multi-view benchmark.
first_indexed 2024-03-07T02:06:02Z
format Conference item
id oxford-uuid:9f06858c-349c-416f-8ace-87751cd401fc
institution University of Oxford
last_indexed 2024-03-07T02:06:02Z
publishDate 2017
publisher British Machine Vision Association and Society for Pattern Recognition
record_format dspace
spelling oxford-uuid:9f06858c-349c-416f-8ace-87751cd401fc2022-03-27T00:54:19ZLip reading in profileConference itemhttp://purl.org/coar/resource_type/c_5794uuid:9f06858c-349c-416f-8ace-87751cd401fcSymplectic Elements at OxfordBritish Machine Vision Association and Society for Pattern Recognition2017Chung, JZisserman, AThere has been a quantum leap in the performance of automated lip reading recently due to the application of neural network sequence models trained on a very large corpus of aligned text and face videos. However, this advance has only been demonstrated for frontal or near frontal faces, and so the question remains: can lips be read in profile to the same standard? The objective of this paper is to answer that question. We make three contributions: first, we obtain a new large aligned training corpus that contains profile faces, and select these using a face pose regressor network; second, we propose a curriculum learning procedure that is able to extend SyncNet [10] (a network to synchronize face movements and speech) progressively from frontal to profile faces; third, we demonstrate lip reading in profile for unseen videos. The trained model is evaluated on a held out test set, and is also shown to far surpass the state of the art on the OuluVS2 multi-view benchmark.
spellingShingle Chung, J
Zisserman, A
Lip reading in profile
title Lip reading in profile
title_full Lip reading in profile
title_fullStr Lip reading in profile
title_full_unstemmed Lip reading in profile
title_short Lip reading in profile
title_sort lip reading in profile
work_keys_str_mv AT chungj lipreadinginprofile
AT zissermana lipreadinginprofile