AutoAD-Zero: a training-free framework for zero-shot audio description
Our objective is to generate Audio Descriptions (ADs) for both movies and TV series in a training-free manner. We use the power of off-the-shelf Visual-Language Models (VLMs) and Large Language Models (LLMs), and develop visual and text prompting strategies for this task. Our contributions are three...
Main Authors: | , , , , , , |
---|---|
Format: | Conference item |
Language: | English |
Published: |
Springer
2024
|