Look, listen and recognise: character-aware audio-visual subtitling
The goal of this paper is automatic character-aware subtitle generation. Given a video and a minimal amount of metadata, we propose an audio-visual method that generates a full transcript of the dialogue, with precise speech timestamps, and the character speaking identified. The key idea is to first...
Main Authors: | , , |
---|---|
Format: | Conference item |
Language: | English |
Published: |
IEEE
2024
|