Lip reading in profile
There has been a quantum leap in the performance of automated lip reading recently due to the application of neural network sequence models trained on a very large corpus of aligned text and face videos. However, this advance has only been demonstrated for frontal or near frontal faces, and so the q...
Հիմնական հեղինակներ: | , |
---|---|
Ձևաչափ: | Conference item |
Հրապարակվել է: |
British Machine Vision Association and Society for Pattern Recognition
2017
|
_version_ | 1826287928233426944 |
---|---|
author | Chung, J Zisserman, A |
author_facet | Chung, J Zisserman, A |
author_sort | Chung, J |
collection | OXFORD |
description | There has been a quantum leap in the performance of automated lip reading recently due to the application of neural network sequence models trained on a very large corpus of aligned text and face videos. However, this advance has only been demonstrated for frontal or near frontal faces, and so the question remains: can lips be read in profile to the same standard? The objective of this paper is to answer that question. We make three contributions: first, we obtain a new large aligned training corpus that contains profile faces, and select these using a face pose regressor network; second, we propose a curriculum learning procedure that is able to extend SyncNet [10] (a network to synchronize face movements and speech) progressively from frontal to profile faces; third, we demonstrate lip reading in profile for unseen videos. The trained model is evaluated on a held out test set, and is also shown to far surpass the state of the art on the OuluVS2 multi-view benchmark. |
first_indexed | 2024-03-07T02:06:02Z |
format | Conference item |
id | oxford-uuid:9f06858c-349c-416f-8ace-87751cd401fc |
institution | University of Oxford |
last_indexed | 2024-03-07T02:06:02Z |
publishDate | 2017 |
publisher | British Machine Vision Association and Society for Pattern Recognition |
record_format | dspace |
spelling | oxford-uuid:9f06858c-349c-416f-8ace-87751cd401fc2022-03-27T00:54:19ZLip reading in profileConference itemhttp://purl.org/coar/resource_type/c_5794uuid:9f06858c-349c-416f-8ace-87751cd401fcSymplectic Elements at OxfordBritish Machine Vision Association and Society for Pattern Recognition2017Chung, JZisserman, AThere has been a quantum leap in the performance of automated lip reading recently due to the application of neural network sequence models trained on a very large corpus of aligned text and face videos. However, this advance has only been demonstrated for frontal or near frontal faces, and so the question remains: can lips be read in profile to the same standard? The objective of this paper is to answer that question. We make three contributions: first, we obtain a new large aligned training corpus that contains profile faces, and select these using a face pose regressor network; second, we propose a curriculum learning procedure that is able to extend SyncNet [10] (a network to synchronize face movements and speech) progressively from frontal to profile faces; third, we demonstrate lip reading in profile for unseen videos. The trained model is evaluated on a held out test set, and is also shown to far surpass the state of the art on the OuluVS2 multi-view benchmark. |
spellingShingle | Chung, J Zisserman, A Lip reading in profile |
title | Lip reading in profile |
title_full | Lip reading in profile |
title_fullStr | Lip reading in profile |
title_full_unstemmed | Lip reading in profile |
title_short | Lip reading in profile |
title_sort | lip reading in profile |
work_keys_str_mv | AT chungj lipreadinginprofile AT zissermana lipreadinginprofile |