Towards General-purpose Vision via Multiview Contrastive Learning

Representation learning plays a key role in building robust and general-purpose vision learners, and is a long-standing problem. It becomes increasingly interesting with the continuing explosion of data in our era. However, most previous approaches are based on specific designs of strategies that...

Full description

Bibliographic Details
Main Author:	Tian, Yonglong
Other Authors:	Isola, Phillip
Format:	Thesis
Published:	Massachusetts Institute of Technology 2023
Online Access:	https://hdl.handle.net/1721.1/150229

_version_	1826211599167258624
author	Tian, Yonglong
author2	Isola, Phillip
author_facet	Isola, Phillip Tian, Yonglong
author_sort	Tian, Yonglong
collection	MIT
description	Representation learning plays a key role in building robust and general-purpose vision learners, and is a long-standing problem. It becomes increasingly interesting with the continuing explosion of data in our era. However, most previous approaches are based on specific designs of strategies that are not generalizable. This thesis instead proposes and studies multiview contrastive learning, which is based on a simple mathematical principle -- discriminating between samples from the joint distribution and samples from the product of marginals. We firstly introduce the general framework of multiview contrastive learning (MCL). We demonstrate that this simple framework is able to deal with various representation learning problems, and often improves the state of the arts to the next level. Then we move forward by trying to understand the role of view selection in multiview contrastive learning from an information-theoretic point of view, and come up with an "InfoMin" principle, which connects to minimal sufficient statistics and information bottlenecks. Such principle is further demonstrated by supervised contrastive learning, which rivals or even beats the supervised cross-entropy learning on standard image classification benchmarks. In the last part, we discuss other applications (such as knowledge distillation) and improvements of multiview contrastive learning (e.g., how to improve its efficiency on uncurated data).
first_indexed	2024-09-23T15:08:32Z
format	Thesis
id	mit-1721.1/150229
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T15:08:32Z
publishDate	2023
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1502292023-04-01T03:11:58Z Towards General-purpose Vision via Multiview Contrastive Learning Tian, Yonglong Isola, Phillip Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Representation learning plays a key role in building robust and general-purpose vision learners, and is a long-standing problem. It becomes increasingly interesting with the continuing explosion of data in our era. However, most previous approaches are based on specific designs of strategies that are not generalizable. This thesis instead proposes and studies multiview contrastive learning, which is based on a simple mathematical principle -- discriminating between samples from the joint distribution and samples from the product of marginals. We firstly introduce the general framework of multiview contrastive learning (MCL). We demonstrate that this simple framework is able to deal with various representation learning problems, and often improves the state of the arts to the next level. Then we move forward by trying to understand the role of view selection in multiview contrastive learning from an information-theoretic point of view, and come up with an "InfoMin" principle, which connects to minimal sufficient statistics and information bottlenecks. Such principle is further demonstrated by supervised contrastive learning, which rivals or even beats the supervised cross-entropy learning on standard image classification benchmarks. In the last part, we discuss other applications (such as knowledge distillation) and improvements of multiview contrastive learning (e.g., how to improve its efficiency on uncurated data). Ph.D. 2023-03-31T14:41:08Z 2023-03-31T14:41:08Z 2023-02 2023-02-28T14:39:16.880Z Thesis https://hdl.handle.net/1721.1/150229 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Tian, Yonglong Towards General-purpose Vision via Multiview Contrastive Learning
title	Towards General-purpose Vision via Multiview Contrastive Learning
title_full	Towards General-purpose Vision via Multiview Contrastive Learning
title_fullStr	Towards General-purpose Vision via Multiview Contrastive Learning
title_full_unstemmed	Towards General-purpose Vision via Multiview Contrastive Learning
title_short	Towards General-purpose Vision via Multiview Contrastive Learning
title_sort	towards general purpose vision via multiview contrastive learning
url	https://hdl.handle.net/1721.1/150229
work_keys_str_mv	AT tianyonglong towardsgeneralpurposevisionviamultiviewcontrastivelearning

Towards General-purpose Vision via Multiview Contrastive Learning

Similar Items