Towards General-purpose Vision via Multiview Contrastive Learning

Representation learning plays a key role in building robust and general-purpose vision learners, and is a long-standing problem. It becomes increasingly interesting with the continuing explosion of data in our era. However, most previous approaches are based on specific designs of strategies that...

Full description

Bibliographic Details
Main Author: Tian, Yonglong
Other Authors: Isola, Phillip
Format: Thesis
Published: Massachusetts Institute of Technology 2023
Online Access:https://hdl.handle.net/1721.1/150229
Description
Summary:Representation learning plays a key role in building robust and general-purpose vision learners, and is a long-standing problem. It becomes increasingly interesting with the continuing explosion of data in our era. However, most previous approaches are based on specific designs of strategies that are not generalizable. This thesis instead proposes and studies multiview contrastive learning, which is based on a simple mathematical principle -- discriminating between samples from the joint distribution and samples from the product of marginals. We firstly introduce the general framework of multiview contrastive learning (MCL). We demonstrate that this simple framework is able to deal with various representation learning problems, and often improves the state of the arts to the next level. Then we move forward by trying to understand the role of view selection in multiview contrastive learning from an information-theoretic point of view, and come up with an "InfoMin" principle, which connects to minimal sufficient statistics and information bottlenecks. Such principle is further demonstrated by supervised contrastive learning, which rivals or even beats the supervised cross-entropy learning on standard image classification benchmarks. In the last part, we discuss other applications (such as knowledge distillation) and improvements of multiview contrastive learning (e.g., how to improve its efficiency on uncurated data).