AutoAD: movie description in context

The objective of this paper is an automatic Audio Description (AD) model that ingests movies and outputs AD in text form. Generating high-quality movie AD is challenging due to the dependency of the descriptions on context, and the limited amount of training data available. In this work, we leverage...

Full description

Bibliographic Details
Main Authors:	Han, T, Bain, M, Nagrani, A, Varol, G, Xie, W, Zisserman, A
Format:	Conference item
Language:	English
Published:	IEEE 2023

_version_	1826310698411491328
author	Han, T Bain, M Nagrani, A Varol, G Xie, W Zisserman, A
author_facet	Han, T Bain, M Nagrani, A Varol, G Xie, W Zisserman, A
author_sort	Han, T
collection	OXFORD
description	The objective of this paper is an automatic Audio Description (AD) model that ingests movies and outputs AD in text form. Generating high-quality movie AD is challenging due to the dependency of the descriptions on context, and the limited amount of training data available. In this work, we leverage the power of pretrained foundation models, such as GPT and CLIP, and only train a mapping network that bridges the two models for visually-conditioned text generation. In order to obtain high-quality AD, we make the following four contributions: (i) we incorporate context from the movie clip, AD from previous clips, as well as the subtitles; (ii) we address the lack of training data by pretraining on large-scale datasets, where visual or contextual information is unavailable, e.g. text-only AD without movies or visual captioning datasets without context; (iii) we improve on the currently available AD datasets, by removing label noise in the MAD dataset, and adding character naming information; and (iv) we obtain strong results on the movie AD task compared with previous methods.
first_indexed	2024-03-07T07:55:48Z
format	Conference item
id	oxford-uuid:4a657b01-d549-49e4-94bf-1d45417c045e
institution	University of Oxford
language	English
last_indexed	2024-03-07T07:55:48Z
publishDate	2023
publisher	IEEE
record_format	dspace
spelling	oxford-uuid:4a657b01-d549-49e4-94bf-1d45417c045e2023-08-23T08:27:34ZAutoAD: movie description in contextConference itemhttp://purl.org/coar/resource_type/c_5794uuid:4a657b01-d549-49e4-94bf-1d45417c045eEnglishSymplectic ElementsIEEE2023Han, TBain, MNagrani, AVarol, GXie, WZisserman, AThe objective of this paper is an automatic Audio Description (AD) model that ingests movies and outputs AD in text form. Generating high-quality movie AD is challenging due to the dependency of the descriptions on context, and the limited amount of training data available. In this work, we leverage the power of pretrained foundation models, such as GPT and CLIP, and only train a mapping network that bridges the two models for visually-conditioned text generation. In order to obtain high-quality AD, we make the following four contributions: (i) we incorporate context from the movie clip, AD from previous clips, as well as the subtitles; (ii) we address the lack of training data by pretraining on large-scale datasets, where visual or contextual information is unavailable, e.g. text-only AD without movies or visual captioning datasets without context; (iii) we improve on the currently available AD datasets, by removing label noise in the MAD dataset, and adding character naming information; and (iv) we obtain strong results on the movie AD task compared with previous methods.
spellingShingle	Han, T Bain, M Nagrani, A Varol, G Xie, W Zisserman, A AutoAD: movie description in context
title	AutoAD: movie description in context
title_full	AutoAD: movie description in context
title_fullStr	AutoAD: movie description in context
title_full_unstemmed	AutoAD: movie description in context
title_short	AutoAD: movie description in context
title_sort	autoad movie description in context
work_keys_str_mv	AT hant autoadmoviedescriptionincontext AT bainm autoadmoviedescriptionincontext AT nagrania autoadmoviedescriptionincontext AT varolg autoadmoviedescriptionincontext AT xiew autoadmoviedescriptionincontext AT zissermana autoadmoviedescriptionincontext

AutoAD: movie description in context

Similar Items