Leveraging Structure and Knowledge in Clinical and Biomedical Representation Learning

Datasets in the machine learning for health and biomedicine domain are often noisy, irregularly sampled, only sparsely labeled, and small relative to the dimensionality of the both the data and the tasks. These problems motivate the use of representation learning in this domain, which encompasses a...

Full description

Bibliographic Details
Main Author: McDermott, Matthew B. A.
Other Authors: Szolovits, Peter
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/144655
_version_ 1826204650526736384
author McDermott, Matthew B. A.
author2 Szolovits, Peter
author_facet Szolovits, Peter
McDermott, Matthew B. A.
author_sort McDermott, Matthew B. A.
collection MIT
description Datasets in the machine learning for health and biomedicine domain are often noisy, irregularly sampled, only sparsely labeled, and small relative to the dimensionality of the both the data and the tasks. These problems motivate the use of representation learning in this domain, which encompasses a variety of techniques designed to produce representations of a dataset that are amenable to downstream modelling tasks. Representation learning in this domain can also take advantage of the significant external knowledge in the biomedical domain. In this thesis, I will explore novel pre-training and representation learning strategies for biomedical data which leverage external structure or knowledge to inform learning at both local and global scales. These techniques will be explored in 4 chapters: (1) leveraging unlabeled data to infer distributional constraints in a semi-supervised learning setting; (2) using graph convolutional neural networks over gene-gene co-regulatory networks to improve modelling of gene expression data; (3) adapting pre-training techniques from natural language processing to electronic health record data, and showing that novel methods are needed for electronic health record timeseries data; and (4) asserting global structure in pre-training applications through structure-inducing pre-training.
first_indexed 2024-09-23T12:58:49Z
format Thesis
id mit-1721.1/144655
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T12:58:49Z
publishDate 2022
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1446552022-08-30T03:32:03Z Leveraging Structure and Knowledge in Clinical and Biomedical Representation Learning McDermott, Matthew B. A. Szolovits, Peter Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Datasets in the machine learning for health and biomedicine domain are often noisy, irregularly sampled, only sparsely labeled, and small relative to the dimensionality of the both the data and the tasks. These problems motivate the use of representation learning in this domain, which encompasses a variety of techniques designed to produce representations of a dataset that are amenable to downstream modelling tasks. Representation learning in this domain can also take advantage of the significant external knowledge in the biomedical domain. In this thesis, I will explore novel pre-training and representation learning strategies for biomedical data which leverage external structure or knowledge to inform learning at both local and global scales. These techniques will be explored in 4 chapters: (1) leveraging unlabeled data to infer distributional constraints in a semi-supervised learning setting; (2) using graph convolutional neural networks over gene-gene co-regulatory networks to improve modelling of gene expression data; (3) adapting pre-training techniques from natural language processing to electronic health record data, and showing that novel methods are needed for electronic health record timeseries data; and (4) asserting global structure in pre-training applications through structure-inducing pre-training. Ph.D. 2022-08-29T16:02:34Z 2022-08-29T16:02:34Z 2022-05 2022-06-21T19:15:42.073Z Thesis https://hdl.handle.net/1721.1/144655 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle McDermott, Matthew B. A.
Leveraging Structure and Knowledge in Clinical and Biomedical Representation Learning
title Leveraging Structure and Knowledge in Clinical and Biomedical Representation Learning
title_full Leveraging Structure and Knowledge in Clinical and Biomedical Representation Learning
title_fullStr Leveraging Structure and Knowledge in Clinical and Biomedical Representation Learning
title_full_unstemmed Leveraging Structure and Knowledge in Clinical and Biomedical Representation Learning
title_short Leveraging Structure and Knowledge in Clinical and Biomedical Representation Learning
title_sort leveraging structure and knowledge in clinical and biomedical representation learning
url https://hdl.handle.net/1721.1/144655
work_keys_str_mv AT mcdermottmatthewba leveragingstructureandknowledgeinclinicalandbiomedicalrepresentationlearning