Learning Reconfigurable Vision Models

Over the past decade, deep learning methods have emerged as the predominant approach in a wide variety of fields, such as computer vision, natural language processing, and speech recognition. However, these models have also been notorious for their high computational costs and substantial data requi...

Full description

Bibliographic Details
Main Author:	Gonzalez Ortiz, Jose Javier
Other Authors:	Guttag, John V.
Format:	Thesis
Published:	Massachusetts Institute of Technology 2024
Online Access:	https://hdl.handle.net/1721.1/153839

_version_	1826190527671828480
author	Gonzalez Ortiz, Jose Javier
author2	Guttag, John V.
author_facet	Guttag, John V. Gonzalez Ortiz, Jose Javier
author_sort	Gonzalez Ortiz, Jose Javier
collection	MIT
description	Over the past decade, deep learning methods have emerged as the predominant approach in a wide variety of fields, such as computer vision, natural language processing, and speech recognition. However, these models have also been notorious for their high computational costs and substantial data requirements. Furthermore, they often present significant challenges to non-technical users, who most often lack the expertise needed to effectively tailor these models to their specific applications. In this thesis, we tackle these challenges by exploring amortizing the cost of training models with similar learning tasks. Instead of training multiple models independently, we propose learning a single, reconfigurable model that effectively captures the spectrum of underlying problems. Once trained, this model can be dynamically reconfigured at inference time, adapting its properties without incurring additional training costs. First, we introduce Scale-Space Hypernetworks, a method for learning a continuum of CNNs with varying efficiency characteristics. This enables us to characterize an entire Pareto accuracy-efficiency curve of models by training a single hypernetwork, dramatically reducing training costs. Then, we characterize a previously unidentified optimization problem in hypernetwork training, and propose a revised hypernetwork formulation that leads to faster convergence and more stable training. Lastly, we present UniverSeg, an in-context learning method for universal biomedical image segmentation. Given a query image and an example set of image-label pairs that define a new segmentation task, it produces accurate segmentation without additional training, outperforming several related methods on unseen segmentation tasks. We empirically demonstrate the validity of our methods in real-world applications, focusing on computer vision and biomedical imaging, where we assess a wide array of tasks and datasets. In all of these works we find that it is not only feasible to train reconfigurable models but that in doing so, we achieve substantial efficiency gains both at training and at inference time.
first_indexed	2024-09-23T08:41:39Z
format	Thesis
id	mit-1721.1/153839
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T08:41:39Z
publishDate	2024
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1538392024-03-22T04:07:12Z Learning Reconfigurable Vision Models Gonzalez Ortiz, Jose Javier Guttag, John V. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Over the past decade, deep learning methods have emerged as the predominant approach in a wide variety of fields, such as computer vision, natural language processing, and speech recognition. However, these models have also been notorious for their high computational costs and substantial data requirements. Furthermore, they often present significant challenges to non-technical users, who most often lack the expertise needed to effectively tailor these models to their specific applications. In this thesis, we tackle these challenges by exploring amortizing the cost of training models with similar learning tasks. Instead of training multiple models independently, we propose learning a single, reconfigurable model that effectively captures the spectrum of underlying problems. Once trained, this model can be dynamically reconfigured at inference time, adapting its properties without incurring additional training costs. First, we introduce Scale-Space Hypernetworks, a method for learning a continuum of CNNs with varying efficiency characteristics. This enables us to characterize an entire Pareto accuracy-efficiency curve of models by training a single hypernetwork, dramatically reducing training costs. Then, we characterize a previously unidentified optimization problem in hypernetwork training, and propose a revised hypernetwork formulation that leads to faster convergence and more stable training. Lastly, we present UniverSeg, an in-context learning method for universal biomedical image segmentation. Given a query image and an example set of image-label pairs that define a new segmentation task, it produces accurate segmentation without additional training, outperforming several related methods on unseen segmentation tasks. We empirically demonstrate the validity of our methods in real-world applications, focusing on computer vision and biomedical imaging, where we assess a wide array of tasks and datasets. In all of these works we find that it is not only feasible to train reconfigurable models but that in doing so, we achieve substantial efficiency gains both at training and at inference time. Ph.D. 2024-03-21T19:09:28Z 2024-03-21T19:09:28Z 2024-02 2024-02-21T17:18:45.877Z Thesis https://hdl.handle.net/1721.1/153839 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Gonzalez Ortiz, Jose Javier Learning Reconfigurable Vision Models
title	Learning Reconfigurable Vision Models
title_full	Learning Reconfigurable Vision Models
title_fullStr	Learning Reconfigurable Vision Models
title_full_unstemmed	Learning Reconfigurable Vision Models
title_short	Learning Reconfigurable Vision Models
title_sort	learning reconfigurable vision models
url	https://hdl.handle.net/1721.1/153839
work_keys_str_mv	AT gonzalezortizjosejavier learningreconfigurablevisionmodels

Learning Reconfigurable Vision Models

Similar Items