AutoAD III: the prequel – back to the pixels

Generating Audio Description (AD) for movies is a challenging task that requires fine-grained visual understanding and an awareness of the characters and their names. Currently, visual language models for AD generation are limited by a lack of suitable training data, and also their evaluation is ham...

Full description

Bibliographic Details
Main Authors:	Han, T, Bain, M, Nagrani, A, Varol, G, Xie, W, Zisserman, A
Format:	Conference item
Language:	English
Published:	IEEE 2024

AutoAD III: the prequel – back to the pixels

Similar Items