Deep learning benchmarks on L1000 gene expression data

Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019

Bibliographic Details
Main Author:	McDermott, Matthew B. A.(Matthew Brian Andrew)
Other Authors:	Peter Szolovits.
Format:	Thesis
Language:	eng
Published:	Massachusetts Institute of Technology 2019
Subjects:	Electrical Engineering and Computer Science.
Online Access:	https://hdl.handle.net/1721.1/121738

_version_	1811072341519630336
author	McDermott, Matthew B. A.(Matthew Brian Andrew)
author2	Peter Szolovits.
author_facet	Peter Szolovits. McDermott, Matthew B. A.(Matthew Brian Andrew)
author_sort	McDermott, Matthew B. A.(Matthew Brian Andrew)
collection	MIT
description	Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019
first_indexed	2024-09-23T09:04:27Z
format	Thesis
id	mit-1721.1/121738
institution	Massachusetts Institute of Technology
language	eng
last_indexed	2024-09-23T09:04:27Z
publishDate	2019
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1217382019-08-07T03:03:36Z Deep learning benchmarks on L1000 gene expression data McDermott, Matthew B. A.(Matthew Brian Andrew) Peter Szolovits. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Electrical Engineering and Computer Science. Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019 Cataloged from PDF version of thesis. Includes bibliographical references (pages 57-62). Gene expression data holds the potential to offer deep, physiological insights about the dynamic state of a cell beyond the static coding of the genome alone. I believe that realizing this potential requires specialized machine learning methods capable of using underlying biological structure, but the development of such models is hampered by the lack of an empirical methodological foundation, including published benchmarks and well characterized baselines. In this work, we lay that foundation by profiling a battery of classifiers against newly defined biologically motivated classification tasks on multiple L1000 gene expression datasets. In addition, on our smallest dataset, a privately produced L1000 corpus, we profile per-subject generalizability to provide a novel assessment of performance that is lost in many typical analyses. We compare traditional classifiers, including feed-forward artificial neural networks (FF-ANNs), linear methods, random forests, decision trees, and K nearest neighbor classifiers, as well as graph convolutional neural networks (GCNNs), which augment learning via prior biological domain knowledge. We find GCNNs offer performance improvements given sufficient data, excelling at all tasks on our largest dataset. On smaller datasets, FF-ANNs offer greatest performance. Linear models significantly underperform on all dataset scales, but offer the best per-subject generalizability. Ultimately, these results suggest that structured models such as GCNNs can represent a new direction of focus for the field as our scale of data continues to increase. by Matthew B. A. McDermott. S.M. S.M. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science 2019-07-17T20:59:28Z 2019-07-17T20:59:28Z 2019 2019 Thesis https://hdl.handle.net/1721.1/121738 1102050364 eng MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582 62 pages application/pdf Massachusetts Institute of Technology
spellingShingle	Electrical Engineering and Computer Science. McDermott, Matthew B. A.(Matthew Brian Andrew) Deep learning benchmarks on L1000 gene expression data
title	Deep learning benchmarks on L1000 gene expression data
title_full	Deep learning benchmarks on L1000 gene expression data
title_fullStr	Deep learning benchmarks on L1000 gene expression data
title_full_unstemmed	Deep learning benchmarks on L1000 gene expression data
title_short	Deep learning benchmarks on L1000 gene expression data
title_sort	deep learning benchmarks on l1000 gene expression data
topic	Electrical Engineering and Computer Science.
url	https://hdl.handle.net/1721.1/121738
work_keys_str_mv	AT mcdermottmatthewbamatthewbrianandrew deeplearningbenchmarksonl1000geneexpressiondata

Deep learning benchmarks on L1000 gene expression data

Similar Items