Data-Efficient Machine Learning with Applications to Cardiology

Deep learning models have demonstrated impressive capabilities in many settings including computer vision, natural language generation, and speech processing. However, an important shortcoming of these models is that they often need to be trained on large datasets in order to be most effective. In d...

Full description

Bibliographic Details
Main Author:	Raghu, Aniruddh
Other Authors:	Guttag, John V.
Format:	Thesis
Published:	Massachusetts Institute of Technology 2024
Online Access:	https://hdl.handle.net/1721.1/153841

_version_	1826196189148610560
author	Raghu, Aniruddh
author2	Guttag, John V.
author_facet	Guttag, John V. Raghu, Aniruddh
author_sort	Raghu, Aniruddh
collection	MIT
description	Deep learning models have demonstrated impressive capabilities in many settings including computer vision, natural language generation, and speech processing. However, an important shortcoming of these models is that they often need to be trained on large datasets in order to be most effective. In domains such as medicine, large datasets are not always available, and thus there is a need for data-efficient models that perform well even in limited data regimes. In this thesis, motivated by this need, we present four contributions to data-efficient machine learning: (1) analyzing and improving few-shot learning, where we study a popular few-shot learning algorithm (Model Agnostic Meta-Learning) and provide insights as to why it is effective, proposing a simplified version that offers substantial computational benefits; (2) improving supervised learning on small clinical datasets of electrocardiograms (ECGs), where we develop a new data augmentation strategy for ECGs that helps boost performance on a range of predictive problems; (3) improving pre-training through the use of nested optimization, introducing an efficient gradient based algorithm to jointly optimize model parameters and pre-training algorithm design choices; and (4) developing a new self-supervised learning pipeline for complex clinical time series, where the design of the pipeline is driven by the multimodal, multi-dimensional nature of real-world clinical time series data. Unifying several of these contributions is the application area of cardiovascular medicine, a setting in which machine learning has the potential to improve patient care and outcomes.
first_indexed	2024-09-23T10:22:56Z
format	Thesis
id	mit-1721.1/153841
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T10:22:56Z
publishDate	2024
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1538412024-03-22T03:53:12Z Data-Efficient Machine Learning with Applications to Cardiology Raghu, Aniruddh Guttag, John V. Stultz, Collin M. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Deep learning models have demonstrated impressive capabilities in many settings including computer vision, natural language generation, and speech processing. However, an important shortcoming of these models is that they often need to be trained on large datasets in order to be most effective. In domains such as medicine, large datasets are not always available, and thus there is a need for data-efficient models that perform well even in limited data regimes. In this thesis, motivated by this need, we present four contributions to data-efficient machine learning: (1) analyzing and improving few-shot learning, where we study a popular few-shot learning algorithm (Model Agnostic Meta-Learning) and provide insights as to why it is effective, proposing a simplified version that offers substantial computational benefits; (2) improving supervised learning on small clinical datasets of electrocardiograms (ECGs), where we develop a new data augmentation strategy for ECGs that helps boost performance on a range of predictive problems; (3) improving pre-training through the use of nested optimization, introducing an efficient gradient based algorithm to jointly optimize model parameters and pre-training algorithm design choices; and (4) developing a new self-supervised learning pipeline for complex clinical time series, where the design of the pipeline is driven by the multimodal, multi-dimensional nature of real-world clinical time series data. Unifying several of these contributions is the application area of cardiovascular medicine, a setting in which machine learning has the potential to improve patient care and outcomes. Ph.D. 2024-03-21T19:09:36Z 2024-03-21T19:09:36Z 2024-02 2024-02-21T17:19:10.428Z Thesis https://hdl.handle.net/1721.1/153841 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Raghu, Aniruddh Data-Efficient Machine Learning with Applications to Cardiology
title	Data-Efficient Machine Learning with Applications to Cardiology
title_full	Data-Efficient Machine Learning with Applications to Cardiology
title_fullStr	Data-Efficient Machine Learning with Applications to Cardiology
title_full_unstemmed	Data-Efficient Machine Learning with Applications to Cardiology
title_short	Data-Efficient Machine Learning with Applications to Cardiology
title_sort	data efficient machine learning with applications to cardiology
url	https://hdl.handle.net/1721.1/153841
work_keys_str_mv	AT raghuaniruddh dataefficientmachinelearningwithapplicationstocardiology

Data-Efficient Machine Learning with Applications to Cardiology

Similar Items