AutoAD III: the prequel – back to the pixels
Generating Audio Description (AD) for movies is a challenging task that requires fine-grained visual understanding and an awareness of the characters and their names. Currently, visual language models for AD generation are limited by a lack of suitable training data, and also their evaluation is ham...
Main Authors: | , , , , , |
---|---|
Format: | Conference item |
Language: | English |
Published: |
IEEE
2024
|