Personalised CLIP or: how to find your vacation videos
In this paper, our goal is a person-centric model capable of retrieving the image or video corresponding to a personalized compound query from a large set of images or videos. Specifically, given a query consisting of an image of a person's \textit{face} and a text \textit{scene description} or...
Main Authors: | , |
---|---|
Format: | Conference item |
Language: | English |
Published: |
British Machine Vision Association
2022
|
_version_ | 1797108381881729024 |
---|---|
author | Korbar, B Zisserman, A |
author_facet | Korbar, B Zisserman, A |
author_sort | Korbar, B |
collection | OXFORD |
description | In this paper, our goal is a person-centric model capable of retrieving the image or video corresponding to a personalized compound query from a large set of images or videos. Specifically, given a query consisting of an image of a person's \textit{face} and a text \textit{scene description} or \textit{action description}, we retrieve images or video-clips corresponding to this compound query. We make three contributions: (1) we propose~\model, a model that is able to retrieve images/video given a personalized compound-query. We achieve this by building on a pre-trained CLIP vision-text model that has compound, but general, query capabilities, and provide a mechanism to personalize it to the target person specified by their face; (2) we share a new {\em Celebrities in Action} (\dset) dataset of movies with automatically generated annotations for identities, locations, and actions that can be used for evaluation of the compound-retrieval task; (3) we evaluate our model's performance on two datasets: Celebrities in Places for compound queries of a celebrity and a scene description; and our new \dset\ for compound queries of a celebrity and an action description. We demonstrate the flexibility of the model with free-form queries and compare to previous methods. |
first_indexed | 2024-03-07T07:28:22Z |
format | Conference item |
id | oxford-uuid:1763b4e3-9623-4663-9140-bd5dcefd2f57 |
institution | University of Oxford |
language | English |
last_indexed | 2024-03-07T07:28:22Z |
publishDate | 2022 |
publisher | British Machine Vision Association |
record_format | dspace |
spelling | oxford-uuid:1763b4e3-9623-4663-9140-bd5dcefd2f572022-12-19T16:31:25ZPersonalised CLIP or: how to find your vacation videosConference itemhttp://purl.org/coar/resource_type/c_5794uuid:1763b4e3-9623-4663-9140-bd5dcefd2f57EnglishSymplectic ElementsBritish Machine Vision Association2022Korbar, BZisserman, AIn this paper, our goal is a person-centric model capable of retrieving the image or video corresponding to a personalized compound query from a large set of images or videos. Specifically, given a query consisting of an image of a person's \textit{face} and a text \textit{scene description} or \textit{action description}, we retrieve images or video-clips corresponding to this compound query. We make three contributions: (1) we propose~\model, a model that is able to retrieve images/video given a personalized compound-query. We achieve this by building on a pre-trained CLIP vision-text model that has compound, but general, query capabilities, and provide a mechanism to personalize it to the target person specified by their face; (2) we share a new {\em Celebrities in Action} (\dset) dataset of movies with automatically generated annotations for identities, locations, and actions that can be used for evaluation of the compound-retrieval task; (3) we evaluate our model's performance on two datasets: Celebrities in Places for compound queries of a celebrity and a scene description; and our new \dset\ for compound queries of a celebrity and an action description. We demonstrate the flexibility of the model with free-form queries and compare to previous methods. |
spellingShingle | Korbar, B Zisserman, A Personalised CLIP or: how to find your vacation videos |
title | Personalised CLIP or: how to find your vacation videos |
title_full | Personalised CLIP or: how to find your vacation videos |
title_fullStr | Personalised CLIP or: how to find your vacation videos |
title_full_unstemmed | Personalised CLIP or: how to find your vacation videos |
title_short | Personalised CLIP or: how to find your vacation videos |
title_sort | personalised clip or how to find your vacation videos |
work_keys_str_mv | AT korbarb personalisedcliporhowtofindyourvacationvideos AT zissermana personalisedcliporhowtofindyourvacationvideos |