Continuous lipreading based on acoustic temporal alignments

Abstract Visual speech recognition (VSR) is a challenging task that has received increasing interest during the last few decades. Current state of the art employs powerful end-to-end architectures based on deep learning which depend on large amounts of data and high computational resources for their...

Full description

Bibliographic Details
Main Authors: David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos
Format: Article
Language:English
Published: SpringerOpen 2024-05-01
Series:EURASIP Journal on Audio, Speech, and Music Processing
Subjects:
Online Access:https://doi.org/10.1186/s13636-024-00345-7