Towards General-purpose Vision via Multiview Contrastive Learning
Representation learning plays a key role in building robust and general-purpose vision learners, and is a long-standing problem. It becomes increasingly interesting with the continuing explosion of data in our era. However, most previous approaches are based on specific designs of strategies that...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2023
|
Online Access: | https://hdl.handle.net/1721.1/150229 |
_version_ | 1826211599167258624 |
---|---|
author | Tian, Yonglong |
author2 | Isola, Phillip |
author_facet | Isola, Phillip Tian, Yonglong |
author_sort | Tian, Yonglong |
collection | MIT |
description | Representation learning plays a key role in building robust and general-purpose vision learners, and is a long-standing problem. It becomes increasingly interesting with the continuing explosion of data in our era.
However, most previous approaches are based on specific designs of strategies that are not generalizable. This thesis instead proposes and studies multiview contrastive learning, which is based on a simple mathematical principle -- discriminating between samples from the joint distribution and samples from the product of marginals. We firstly introduce the general framework of multiview contrastive learning (MCL). We demonstrate that this simple framework is able to deal with various representation learning problems, and often improves the state of the arts to the next level. Then we move forward by trying to understand the role of view selection in multiview contrastive learning from an information-theoretic point of view, and come up with an "InfoMin" principle, which connects to minimal sufficient statistics and information bottlenecks. Such principle is further demonstrated by supervised contrastive learning, which rivals or even beats the supervised cross-entropy learning on standard image classification benchmarks. In the last part, we discuss other applications (such as knowledge distillation) and improvements of multiview contrastive learning (e.g., how to improve its efficiency on uncurated data). |
first_indexed | 2024-09-23T15:08:32Z |
format | Thesis |
id | mit-1721.1/150229 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T15:08:32Z |
publishDate | 2023 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1502292023-04-01T03:11:58Z Towards General-purpose Vision via Multiview Contrastive Learning Tian, Yonglong Isola, Phillip Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Representation learning plays a key role in building robust and general-purpose vision learners, and is a long-standing problem. It becomes increasingly interesting with the continuing explosion of data in our era. However, most previous approaches are based on specific designs of strategies that are not generalizable. This thesis instead proposes and studies multiview contrastive learning, which is based on a simple mathematical principle -- discriminating between samples from the joint distribution and samples from the product of marginals. We firstly introduce the general framework of multiview contrastive learning (MCL). We demonstrate that this simple framework is able to deal with various representation learning problems, and often improves the state of the arts to the next level. Then we move forward by trying to understand the role of view selection in multiview contrastive learning from an information-theoretic point of view, and come up with an "InfoMin" principle, which connects to minimal sufficient statistics and information bottlenecks. Such principle is further demonstrated by supervised contrastive learning, which rivals or even beats the supervised cross-entropy learning on standard image classification benchmarks. In the last part, we discuss other applications (such as knowledge distillation) and improvements of multiview contrastive learning (e.g., how to improve its efficiency on uncurated data). Ph.D. 2023-03-31T14:41:08Z 2023-03-31T14:41:08Z 2023-02 2023-02-28T14:39:16.880Z Thesis https://hdl.handle.net/1721.1/150229 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Tian, Yonglong Towards General-purpose Vision via Multiview Contrastive Learning |
title | Towards General-purpose Vision via Multiview Contrastive Learning |
title_full | Towards General-purpose Vision via Multiview Contrastive Learning |
title_fullStr | Towards General-purpose Vision via Multiview Contrastive Learning |
title_full_unstemmed | Towards General-purpose Vision via Multiview Contrastive Learning |
title_short | Towards General-purpose Vision via Multiview Contrastive Learning |
title_sort | towards general purpose vision via multiview contrastive learning |
url | https://hdl.handle.net/1721.1/150229 |
work_keys_str_mv | AT tianyonglong towardsgeneralpurposevisionviamultiviewcontrastivelearning |