Speech recognition models are strong lip-readers

Speech recognition models are strong lip-readers

In this work, we show that a large pre-trained ASR model can be adapted to perform lip-reading. Our method enables an ASR model like Whisper to interpret lip movements in a video and output text transcriptions. We achieve this by learning a cross-modal mapping from a lip sequence to a speech sequenc...

Ausführliche Beschreibung

Bibliographische Detailangaben
Hauptverfasser:	Prajwal, KR, Afouras, T, Zisserman, A
Format:	Conference item
Sprache:	English
Veröffentlicht:	ISCA 2024

Ähnliche Einträge

Sub-word level lip reading with visual attention
von: Prajwal, KR, et al.
Veröffentlicht: (2022)

My lips are concealed: audio-visual speech enhancement through obstructions
von: Afouras, T, et al.
Veröffentlicht: (2019)

Deep lip reading: a comparison of models and an online application
von: Afouras, T, et al.
Veröffentlicht: (2018)

Deep audio-visual speech recognition
von: Afouras, T, et al.
Veröffentlicht: (2018)

ASR is all you need: cross-modal distillation for lip reading
von: Afouras, T, et al.
Veröffentlicht: (2020)

Visual keyword spotting with attention
von: Prajwal, KR, et al.
Veröffentlicht: (2022)

A novel lip geometry approach for audio-visual speech recognition
von: Mohd Zamri, Ibrahim
Veröffentlicht: (2014)

A lip geometry approach for feature-fusion based audio-visual speech recognition
von: M. Z., Ibrahim, et al.
Veröffentlicht: (2014)

Lip reading in the wild
von: Chung, J, et al.
Veröffentlicht: (2017)

Lip reading in profile
von: Chung, J, et al.
Veröffentlicht: (2017)

Reading to listen at the cocktail party: multi-modal speech separation
von: Rahimi, A, et al.
Veröffentlicht: (2022)

Feature-Fusion based Audio-Visual Speech Recognition using Lip Geometry Features in Noisy Environment
von: M. Z., Ibrahim, et al.
Veröffentlicht: (2015)

Lip Reading Sentences in the Wild
von: Chung, J, et al.
Veröffentlicht: (2017)

Weakly-supervised fingerspelling recognition in British Sign Language videos
von: Prajwal, KR, et al.
Veröffentlicht: (2022)

3D lips development and measurement for visual speech synthesis
von: Salleh, Siti Salwa, et al.
Veröffentlicht: (2009)

Infrastructure development for integration of lip reading into the SUMMIT Speech Recognizer
von: La, Chia-Hao, 1980-
Veröffentlicht: (2006)

Learning to lip read words by watching videos
von: Chung, J, et al.
Veröffentlicht: (2018)

Out of time: automated lip sync in the wild
von: Chung, J, et al.
Veröffentlicht: (2017)

Speech Representation Models for Speech Synthesis and Multimodal Speech Recognition
von: Sun, Felix (Felix W.)
Veröffentlicht: (2017)

Speeches for readers: motives and contexts for the circulation of Ciceronian oratory
von: Clark, AM
Veröffentlicht: (2020)

Statistical modeling for speech recognition
von: Khalifa, Othman Omran, et al.
Veröffentlicht: (2013)

Robust speech features and acoustic models for speech recognition
von: Xiao, Xiong
Veröffentlicht: (2010)

Emotion recognition in speech using cross-modal transfer in the wild
von: Albanie, S, et al.
Veröffentlicht: (2018)

Emotion Recognition in Speech using Cross-Modal Transfer in the Wild
von: Albanie, S, et al.
Veröffentlicht: (2018)

Analysis and modeling of non-native speech for automatic speech recognition
von: Livescu, Karen, 1975-
Veröffentlicht: (2013)

Psychoacoustic model for robust speech recognition
von: Luo, Xue Wen
Veröffentlicht: (2010)

Optimizing model training for speech recognition
von: Chak, Hui Ping
Veröffentlicht: (2010)

Subword lexical modelling for speech recognition
von: Lau, Raymond, 1971-
Veröffentlicht: (2009)

Speech recognition by machine /
von: 444579 Ainsworth, William A.
Veröffentlicht: (1988)

Speech2Action: Cross-modal supervision for action recognition
von: Nagrani, A, et al.
Veröffentlicht: (2020)

Impaired speech recognition
von: Lam, Michelle Su-Ann
Veröffentlicht: (2024)

Command by speech recognition
von: Ardian Syah, Mohd Yusof
Veröffentlicht: (2008)

Speech recognition tool
von: Lin, Weixiong.
Veröffentlicht: (2009)

Speech recognition and synthesis
von: Kang, Yi Da
Veröffentlicht: (2023)

On the recognition of speech by machine
von: Hughes, George W.
Veröffentlicht: (2023)

The recognition of speech by machine
Veröffentlicht: (2004)

Speech synthesis and recognition/
von: 421175 Holmes, J. N.
Veröffentlicht: (1988)

Fundamentals of speech recognition /
von: Rabiner, Lawrence R., 1943-, et al.
Veröffentlicht: (1993)

The development of face recognition based on lip using neural network / Halizawati Mohd Nor
von: Mohd Nor, Halizawati
Veröffentlicht: (2006)

Prosodic modeling for improved speech recognition and understanding
von: Wang, Chao, 1972-
Veröffentlicht: (2014)