Is an object-centric video representation beneficial for transfer?

The objective of this work is to learn an object-centric video representation, with the aim of improving transferability to novel tasks, i.e., tasks different from the pre-training task of action classification. To this end, we introduce a new object-centric video recognition model based on a transf...

Full description

Bibliographic Details
Main Authors: Zhang, C, Gupta, A, Zisserman, A
Format: Conference item
Language:English
Published: Springer 2023
_version_ 1797110368491798528
author Zhang, C
Gupta, A
Zisserman, A
author_facet Zhang, C
Gupta, A
Zisserman, A
author_sort Zhang, C
collection OXFORD
description The objective of this work is to learn an object-centric video representation, with the aim of improving transferability to novel tasks, i.e., tasks different from the pre-training task of action classification. To this end, we introduce a new object-centric video recognition model based on a transformer architecture. The model learns a set of object-centric summary vectors for the video, and uses these vectors to fuse the visual and spatio-temporal trajectory ‘modalities’ of the video clip. We also introduce a novel trajectory contrast loss to further enhance objectness in these summary vectors. <br> With experiments on four datasets—SomethingSomething-V2, SomethingElse, Action Genome and EpicKitchens—we show that the object-centric model outperforms prior video representations (both object-agnostic and object-aware), when: (1) classifying actions on unseen objects and unseen environments; (2) low-shot learning of novel classes; (3) linear probe to other downstream tasks; as well as (4) for standard action classification.
first_indexed 2024-03-07T07:54:00Z
format Conference item
id oxford-uuid:15807a37-40dc-478f-8388-8bd958622bc7
institution University of Oxford
language English
last_indexed 2024-03-07T07:54:00Z
publishDate 2023
publisher Springer
record_format dspace
spelling oxford-uuid:15807a37-40dc-478f-8388-8bd958622bc72023-08-08T15:39:39ZIs an object-centric video representation beneficial for transfer?Conference itemhttp://purl.org/coar/resource_type/c_5794uuid:15807a37-40dc-478f-8388-8bd958622bc7EnglishSymplectic ElementsSpringer2023Zhang, CGupta, AZisserman, AThe objective of this work is to learn an object-centric video representation, with the aim of improving transferability to novel tasks, i.e., tasks different from the pre-training task of action classification. To this end, we introduce a new object-centric video recognition model based on a transformer architecture. The model learns a set of object-centric summary vectors for the video, and uses these vectors to fuse the visual and spatio-temporal trajectory ‘modalities’ of the video clip. We also introduce a novel trajectory contrast loss to further enhance objectness in these summary vectors. <br> With experiments on four datasets—SomethingSomething-V2, SomethingElse, Action Genome and EpicKitchens—we show that the object-centric model outperforms prior video representations (both object-agnostic and object-aware), when: (1) classifying actions on unseen objects and unseen environments; (2) low-shot learning of novel classes; (3) linear probe to other downstream tasks; as well as (4) for standard action classification.
spellingShingle Zhang, C
Gupta, A
Zisserman, A
Is an object-centric video representation beneficial for transfer?
title Is an object-centric video representation beneficial for transfer?
title_full Is an object-centric video representation beneficial for transfer?
title_fullStr Is an object-centric video representation beneficial for transfer?
title_full_unstemmed Is an object-centric video representation beneficial for transfer?
title_short Is an object-centric video representation beneficial for transfer?
title_sort is an object centric video representation beneficial for transfer
work_keys_str_mv AT zhangc isanobjectcentricvideorepresentationbeneficialfortransfer
AT guptaa isanobjectcentricvideorepresentationbeneficialfortransfer
AT zissermana isanobjectcentricvideorepresentationbeneficialfortransfer