Learning Reconfigurable Vision Models

Over the past decade, deep learning methods have emerged as the predominant approach in a wide variety of fields, such as computer vision, natural language processing, and speech recognition. However, these models have also been notorious for their high computational costs and substantial data requi...

Full description

Bibliographic Details
Main Author: Gonzalez Ortiz, Jose Javier
Other Authors: Guttag, John V.
Format: Thesis
Published: Massachusetts Institute of Technology 2024
Online Access:https://hdl.handle.net/1721.1/153839
_version_ 1811070792619786240
author Gonzalez Ortiz, Jose Javier
author2 Guttag, John V.
author_facet Guttag, John V.
Gonzalez Ortiz, Jose Javier
author_sort Gonzalez Ortiz, Jose Javier
collection MIT
description Over the past decade, deep learning methods have emerged as the predominant approach in a wide variety of fields, such as computer vision, natural language processing, and speech recognition. However, these models have also been notorious for their high computational costs and substantial data requirements. Furthermore, they often present significant challenges to non-technical users, who most often lack the expertise needed to effectively tailor these models to their specific applications. In this thesis, we tackle these challenges by exploring amortizing the cost of training models with similar learning tasks. Instead of training multiple models independently, we propose learning a single, reconfigurable model that effectively captures the spectrum of underlying problems. Once trained, this model can be dynamically reconfigured at inference time, adapting its properties without incurring additional training costs. First, we introduce Scale-Space Hypernetworks, a method for learning a continuum of CNNs with varying efficiency characteristics. This enables us to characterize an entire Pareto accuracy-efficiency curve of models by training a single hypernetwork, dramatically reducing training costs. Then, we characterize a previously unidentified optimization problem in hypernetwork training, and propose a revised hypernetwork formulation that leads to faster convergence and more stable training. Lastly, we present UniverSeg, an in-context learning method for universal biomedical image segmentation. Given a query image and an example set of image-label pairs that define a new segmentation task, it produces accurate segmentation without additional training, outperforming several related methods on unseen segmentation tasks. We empirically demonstrate the validity of our methods in real-world applications, focusing on computer vision and biomedical imaging, where we assess a wide array of tasks and datasets. In all of these works we find that it is not only feasible to train reconfigurable models but that in doing so, we achieve substantial efficiency gains both at training and at inference time.
first_indexed 2024-09-23T08:41:39Z
format Thesis
id mit-1721.1/153839
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T08:41:39Z
publishDate 2024
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1538392024-03-22T04:07:12Z Learning Reconfigurable Vision Models Gonzalez Ortiz, Jose Javier Guttag, John V. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Over the past decade, deep learning methods have emerged as the predominant approach in a wide variety of fields, such as computer vision, natural language processing, and speech recognition. However, these models have also been notorious for their high computational costs and substantial data requirements. Furthermore, they often present significant challenges to non-technical users, who most often lack the expertise needed to effectively tailor these models to their specific applications. In this thesis, we tackle these challenges by exploring amortizing the cost of training models with similar learning tasks. Instead of training multiple models independently, we propose learning a single, reconfigurable model that effectively captures the spectrum of underlying problems. Once trained, this model can be dynamically reconfigured at inference time, adapting its properties without incurring additional training costs. First, we introduce Scale-Space Hypernetworks, a method for learning a continuum of CNNs with varying efficiency characteristics. This enables us to characterize an entire Pareto accuracy-efficiency curve of models by training a single hypernetwork, dramatically reducing training costs. Then, we characterize a previously unidentified optimization problem in hypernetwork training, and propose a revised hypernetwork formulation that leads to faster convergence and more stable training. Lastly, we present UniverSeg, an in-context learning method for universal biomedical image segmentation. Given a query image and an example set of image-label pairs that define a new segmentation task, it produces accurate segmentation without additional training, outperforming several related methods on unseen segmentation tasks. We empirically demonstrate the validity of our methods in real-world applications, focusing on computer vision and biomedical imaging, where we assess a wide array of tasks and datasets. In all of these works we find that it is not only feasible to train reconfigurable models but that in doing so, we achieve substantial efficiency gains both at training and at inference time. Ph.D. 2024-03-21T19:09:28Z 2024-03-21T19:09:28Z 2024-02 2024-02-21T17:18:45.877Z Thesis https://hdl.handle.net/1721.1/153839 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Gonzalez Ortiz, Jose Javier
Learning Reconfigurable Vision Models
title Learning Reconfigurable Vision Models
title_full Learning Reconfigurable Vision Models
title_fullStr Learning Reconfigurable Vision Models
title_full_unstemmed Learning Reconfigurable Vision Models
title_short Learning Reconfigurable Vision Models
title_sort learning reconfigurable vision models
url https://hdl.handle.net/1721.1/153839
work_keys_str_mv AT gonzalezortizjosejavier learningreconfigurablevisionmodels