On compositions of transformations in contrastive self-supervised learning

In the image domain, excellent representations can be learned by inducing invariance to content-preserving transformations via noise contrastive learning. In this paper, we generalize contrastive learning to a wider set of transformations, and their compositions, for which either invariance or disti...

Full description

Bibliographic Details
Main Authors:	Yuki M. Asano, YM, Patrick, M, Kuznetsova, P, Fong, R, Henriques, JF, Zweig, G, Vedaldi, A
Format:	Conference item
Language:	English
Published:	IEEE 2022

_version_	1826305438418731008
author	Yuki M. Asano, YM Patrick, M Kuznetsova, P Fong, R Henriques, JF Zweig, G Vedaldi, A
author_facet	Yuki M. Asano, YM Patrick, M Kuznetsova, P Fong, R Henriques, JF Zweig, G Vedaldi, A
author_sort	Yuki M. Asano, YM
collection	OXFORD
description	In the image domain, excellent representations can be learned by inducing invariance to content-preserving transformations via noise contrastive learning. In this paper, we generalize contrastive learning to a wider set of transformations, and their compositions, for which either invariance or distinctiveness is sought. We show that it is not immediately obvious how existing methods such as SimCLR can be extended to do so. Instead, we introduce a number of formal requirements that all contrastive formulations must satisfy, and propose a practical construction which satisfies these requirements. In order to maximise the reach of this analysis, we express all components of noise contrastive formulations as the choice of certain generalized transformations of the data (GDTs), including data sampling. We then consider videos as an example of data in which a large variety of transformations are applicable, accounting for the extra modalities – for which we analyze audio and text – and the dimension of time. We find that being invariant to certain transformations and distinctive to others is critical to learning effective video representations, improving the state-of-the-art for multiple benchmarks by a large margin, and even surpassing supervised pretraining. Code and pretrained models are available.
first_indexed	2024-03-07T06:32:52Z
format	Conference item
id	oxford-uuid:f69cb80b-7a83-4920-9254-be845a7885f8
institution	University of Oxford
language	English
last_indexed	2024-03-07T06:32:52Z
publishDate	2022
publisher	IEEE
record_format	dspace
spelling	oxford-uuid:f69cb80b-7a83-4920-9254-be845a7885f82022-03-27T12:36:21ZOn compositions of transformations in contrastive self-supervised learningConference itemhttp://purl.org/coar/resource_type/c_5794uuid:f69cb80b-7a83-4920-9254-be845a7885f8EnglishSymplectic ElementsIEEE2022Yuki M. Asano, YMPatrick, MKuznetsova, PFong, RHenriques, JFZweig, GVedaldi, AIn the image domain, excellent representations can be learned by inducing invariance to content-preserving transformations via noise contrastive learning. In this paper, we generalize contrastive learning to a wider set of transformations, and their compositions, for which either invariance or distinctiveness is sought. We show that it is not immediately obvious how existing methods such as SimCLR can be extended to do so. Instead, we introduce a number of formal requirements that all contrastive formulations must satisfy, and propose a practical construction which satisfies these requirements. In order to maximise the reach of this analysis, we express all components of noise contrastive formulations as the choice of certain generalized transformations of the data (GDTs), including data sampling. We then consider videos as an example of data in which a large variety of transformations are applicable, accounting for the extra modalities – for which we analyze audio and text – and the dimension of time. We find that being invariant to certain transformations and distinctive to others is critical to learning effective video representations, improving the state-of-the-art for multiple benchmarks by a large margin, and even surpassing supervised pretraining. Code and pretrained models are available.
spellingShingle	Yuki M. Asano, YM Patrick, M Kuznetsova, P Fong, R Henriques, JF Zweig, G Vedaldi, A On compositions of transformations in contrastive self-supervised learning
title	On compositions of transformations in contrastive self-supervised learning
title_full	On compositions of transformations in contrastive self-supervised learning
title_fullStr	On compositions of transformations in contrastive self-supervised learning
title_full_unstemmed	On compositions of transformations in contrastive self-supervised learning
title_short	On compositions of transformations in contrastive self-supervised learning
title_sort	on compositions of transformations in contrastive self supervised learning
work_keys_str_mv	AT yukimasanoym oncompositionsoftransformationsincontrastiveselfsupervisedlearning AT patrickm oncompositionsoftransformationsincontrastiveselfsupervisedlearning AT kuznetsovap oncompositionsoftransformationsincontrastiveselfsupervisedlearning AT fongr oncompositionsoftransformationsincontrastiveselfsupervisedlearning AT henriquesjf oncompositionsoftransformationsincontrastiveselfsupervisedlearning AT zweigg oncompositionsoftransformationsincontrastiveselfsupervisedlearning AT vedaldia oncompositionsoftransformationsincontrastiveselfsupervisedlearning

On compositions of transformations in contrastive self-supervised learning

Similar Items