AutoAD III: the prequel – back to the pixels

Generating Audio Description (AD) for movies is a challenging task that requires fine-grained visual understanding and an awareness of the characters and their names. Currently, visual language models for AD generation are limited by a lack of suitable training data, and also their evaluation is ham...

Full description

Bibliographic Details
Main Authors: Han, T, Bain, M, Nagrani, A, Varol, G, Xie, W, Zisserman, A
Format: Conference item
Language:English
Published: IEEE 2024