Personalised CLIP or: how to find your vacation videos

In this paper, our goal is a person-centric model capable of retrieving the image or video corresponding to a personalized compound query from a large set of images or videos. Specifically, given a query consisting of an image of a person's \textit{face} and a text \textit{scene description} or...

Full description

Bibliographic Details
Main Authors: Korbar, B, Zisserman, A
Format: Conference item
Language:English
Published: British Machine Vision Association 2022
_version_ 1797108381881729024
author Korbar, B
Zisserman, A
author_facet Korbar, B
Zisserman, A
author_sort Korbar, B
collection OXFORD
description In this paper, our goal is a person-centric model capable of retrieving the image or video corresponding to a personalized compound query from a large set of images or videos. Specifically, given a query consisting of an image of a person's \textit{face} and a text \textit{scene description} or \textit{action description}, we retrieve images or video-clips corresponding to this compound query. We make three contributions: (1) we propose~\model, a model that is able to retrieve images/video given a personalized compound-query. We achieve this by building on a pre-trained CLIP vision-text model that has compound, but general, query capabilities, and provide a mechanism to personalize it to the target person specified by their face; (2) we share a new {\em Celebrities in Action} (\dset) dataset of movies with automatically generated annotations for identities, locations, and actions that can be used for evaluation of the compound-retrieval task; (3) we evaluate our model's performance on two datasets: Celebrities in Places for compound queries of a celebrity and a scene description; and our new \dset\ for compound queries of a celebrity and an action description. We demonstrate the flexibility of the model with free-form queries and compare to previous methods.
first_indexed 2024-03-07T07:28:22Z
format Conference item
id oxford-uuid:1763b4e3-9623-4663-9140-bd5dcefd2f57
institution University of Oxford
language English
last_indexed 2024-03-07T07:28:22Z
publishDate 2022
publisher British Machine Vision Association
record_format dspace
spelling oxford-uuid:1763b4e3-9623-4663-9140-bd5dcefd2f572022-12-19T16:31:25ZPersonalised CLIP or: how to find your vacation videosConference itemhttp://purl.org/coar/resource_type/c_5794uuid:1763b4e3-9623-4663-9140-bd5dcefd2f57EnglishSymplectic ElementsBritish Machine Vision Association2022Korbar, BZisserman, AIn this paper, our goal is a person-centric model capable of retrieving the image or video corresponding to a personalized compound query from a large set of images or videos. Specifically, given a query consisting of an image of a person's \textit{face} and a text \textit{scene description} or \textit{action description}, we retrieve images or video-clips corresponding to this compound query. We make three contributions: (1) we propose~\model, a model that is able to retrieve images/video given a personalized compound-query. We achieve this by building on a pre-trained CLIP vision-text model that has compound, but general, query capabilities, and provide a mechanism to personalize it to the target person specified by their face; (2) we share a new {\em Celebrities in Action} (\dset) dataset of movies with automatically generated annotations for identities, locations, and actions that can be used for evaluation of the compound-retrieval task; (3) we evaluate our model's performance on two datasets: Celebrities in Places for compound queries of a celebrity and a scene description; and our new \dset\ for compound queries of a celebrity and an action description. We demonstrate the flexibility of the model with free-form queries and compare to previous methods.
spellingShingle Korbar, B
Zisserman, A
Personalised CLIP or: how to find your vacation videos
title Personalised CLIP or: how to find your vacation videos
title_full Personalised CLIP or: how to find your vacation videos
title_fullStr Personalised CLIP or: how to find your vacation videos
title_full_unstemmed Personalised CLIP or: how to find your vacation videos
title_short Personalised CLIP or: how to find your vacation videos
title_sort personalised clip or how to find your vacation videos
work_keys_str_mv AT korbarb personalisedcliporhowtofindyourvacationvideos
AT zissermana personalisedcliporhowtofindyourvacationvideos