Towards General-purpose Vision via Multiview Contrastive Learning

Representation learning plays a key role in building robust and general-purpose vision learners, and is a long-standing problem. It becomes increasingly interesting with the continuing explosion of data in our era. However, most previous approaches are based on specific designs of strategies that...

Full description

Bibliographic Details
Main Author: Tian, Yonglong
Other Authors: Isola, Phillip
Format: Thesis
Published: Massachusetts Institute of Technology 2023
Online Access:https://hdl.handle.net/1721.1/150229
_version_ 1826211599167258624
author Tian, Yonglong
author2 Isola, Phillip
author_facet Isola, Phillip
Tian, Yonglong
author_sort Tian, Yonglong
collection MIT
description Representation learning plays a key role in building robust and general-purpose vision learners, and is a long-standing problem. It becomes increasingly interesting with the continuing explosion of data in our era. However, most previous approaches are based on specific designs of strategies that are not generalizable. This thesis instead proposes and studies multiview contrastive learning, which is based on a simple mathematical principle -- discriminating between samples from the joint distribution and samples from the product of marginals. We firstly introduce the general framework of multiview contrastive learning (MCL). We demonstrate that this simple framework is able to deal with various representation learning problems, and often improves the state of the arts to the next level. Then we move forward by trying to understand the role of view selection in multiview contrastive learning from an information-theoretic point of view, and come up with an "InfoMin" principle, which connects to minimal sufficient statistics and information bottlenecks. Such principle is further demonstrated by supervised contrastive learning, which rivals or even beats the supervised cross-entropy learning on standard image classification benchmarks. In the last part, we discuss other applications (such as knowledge distillation) and improvements of multiview contrastive learning (e.g., how to improve its efficiency on uncurated data).
first_indexed 2024-09-23T15:08:32Z
format Thesis
id mit-1721.1/150229
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T15:08:32Z
publishDate 2023
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1502292023-04-01T03:11:58Z Towards General-purpose Vision via Multiview Contrastive Learning Tian, Yonglong Isola, Phillip Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Representation learning plays a key role in building robust and general-purpose vision learners, and is a long-standing problem. It becomes increasingly interesting with the continuing explosion of data in our era. However, most previous approaches are based on specific designs of strategies that are not generalizable. This thesis instead proposes and studies multiview contrastive learning, which is based on a simple mathematical principle -- discriminating between samples from the joint distribution and samples from the product of marginals. We firstly introduce the general framework of multiview contrastive learning (MCL). We demonstrate that this simple framework is able to deal with various representation learning problems, and often improves the state of the arts to the next level. Then we move forward by trying to understand the role of view selection in multiview contrastive learning from an information-theoretic point of view, and come up with an "InfoMin" principle, which connects to minimal sufficient statistics and information bottlenecks. Such principle is further demonstrated by supervised contrastive learning, which rivals or even beats the supervised cross-entropy learning on standard image classification benchmarks. In the last part, we discuss other applications (such as knowledge distillation) and improvements of multiview contrastive learning (e.g., how to improve its efficiency on uncurated data). Ph.D. 2023-03-31T14:41:08Z 2023-03-31T14:41:08Z 2023-02 2023-02-28T14:39:16.880Z Thesis https://hdl.handle.net/1721.1/150229 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Tian, Yonglong
Towards General-purpose Vision via Multiview Contrastive Learning
title Towards General-purpose Vision via Multiview Contrastive Learning
title_full Towards General-purpose Vision via Multiview Contrastive Learning
title_fullStr Towards General-purpose Vision via Multiview Contrastive Learning
title_full_unstemmed Towards General-purpose Vision via Multiview Contrastive Learning
title_short Towards General-purpose Vision via Multiview Contrastive Learning
title_sort towards general purpose vision via multiview contrastive learning
url https://hdl.handle.net/1721.1/150229
work_keys_str_mv AT tianyonglong towardsgeneralpurposevisionviamultiviewcontrastivelearning