AutoAD-Zero: a training-free framework for zero-shot audio description

Our objective is to generate Audio Descriptions (ADs) for both movies and TV series in a training-free manner. We use the power of off-the-shelf Visual-Language Models (VLMs) and Large Language Models (LLMs), and develop visual and text prompting strategies for this task. Our contributions are three...

全面介绍

书目详细资料
Main Authors: Xie, J, Han, T, Bain, M, Nagrani, A, Varol, G, Xie, W, Zisserman, A
格式: Conference item
语言:English
出版: Springer 2024