AutoAD III: the prequel – back to the pixels
Generating Audio Description (AD) for movies is a challenging task that requires fine-grained visual understanding and an awareness of the characters and their names. Currently, visual language models for AD generation are limited by a lack of suitable training data, and also their evaluation is ham...
Main Authors: | Han, T, Bain, M, Nagrani, A, Varol, G, Xie, W, Zisserman, A |
---|---|
Format: | Conference item |
Language: | English |
Published: |
IEEE
2024
|
Similar Items
-
Combining scene and auto-calibration constraints
by: Liebowitz, D, et al.
Published: (2002) -
Learning heterogeneous reaction kinetics from X-ray videos pixel by pixel
by: Zhao, Hongbo, et al.
Published: (2024) -
Linear auto-calibration for ground plane motion
by: Knight, J, et al.
Published: (2003) -
Code auto-completion by intelligent auto-copy-paste-modify
by: Arief Setiawan
Published: (2014) -
Development and characterisation of novel silicon pixel detectors for tracking and timing
by: Gaži, M
Published: (2024)