Theory and Applications of Matrix Completion in Genomics Datasets

The advent of rapid and efficient biological screening and sequencing technologies has enabled high-throughput data collection, opening the door to improvements in drug discovery, disease identification, and personalized medicine, among others. The size and scope of such datasets is unprecedented, a...

Full description

Bibliographic Details
Main Author: Stefanakis, George
Other Authors: Uhler, Caroline
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/144547
_version_ 1826212793433456640
author Stefanakis, George
author2 Uhler, Caroline
author_facet Uhler, Caroline
Stefanakis, George
author_sort Stefanakis, George
collection MIT
description The advent of rapid and efficient biological screening and sequencing technologies has enabled high-throughput data collection, opening the door to improvements in drug discovery, disease identification, and personalized medicine, among others. The size and scope of such datasets is unprecedented, and their increased availability over the past decade, in conjunction with rapid advancements in statistical inference and machine learning, has paved the way for an explosion in research. Still, many problems in this space are yet-unexplored or still in their infancy, either due to data availability or lack of computationally efficient or high-accuracy methods for modeling and prediction. In this work, we develop theory and demonstrate empirical results for use of the novel Neural Tangent Kernel (NTK) in matrix completion. We derive the functional form of the NTK for a single-hidden-layer, infinite-width neural network with ReLU activation, and develop a framework applying the NTK to matrix completion. We explore a specific application of this framework, using the Connectivity Map dataset of gene expression data for various cells and perturbations, demonstrating competitive results as compared to other methods. Additionally, we analyze our contributions through the auxiliary lens of performance engineering and develop concrete algorithms for accurate, performant, and intuitive biological imputation.
first_indexed 2024-09-23T15:38:19Z
format Thesis
id mit-1721.1/144547
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T15:38:19Z
publishDate 2022
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1445472022-08-30T03:09:13Z Theory and Applications of Matrix Completion in Genomics Datasets Stefanakis, George Uhler, Caroline Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science The advent of rapid and efficient biological screening and sequencing technologies has enabled high-throughput data collection, opening the door to improvements in drug discovery, disease identification, and personalized medicine, among others. The size and scope of such datasets is unprecedented, and their increased availability over the past decade, in conjunction with rapid advancements in statistical inference and machine learning, has paved the way for an explosion in research. Still, many problems in this space are yet-unexplored or still in their infancy, either due to data availability or lack of computationally efficient or high-accuracy methods for modeling and prediction. In this work, we develop theory and demonstrate empirical results for use of the novel Neural Tangent Kernel (NTK) in matrix completion. We derive the functional form of the NTK for a single-hidden-layer, infinite-width neural network with ReLU activation, and develop a framework applying the NTK to matrix completion. We explore a specific application of this framework, using the Connectivity Map dataset of gene expression data for various cells and perturbations, demonstrating competitive results as compared to other methods. Additionally, we analyze our contributions through the auxiliary lens of performance engineering and develop concrete algorithms for accurate, performant, and intuitive biological imputation. M.Eng. 2022-08-29T15:55:03Z 2022-08-29T15:55:03Z 2022-05 2022-05-27T16:18:46.843Z Thesis https://hdl.handle.net/1721.1/144547 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Stefanakis, George
Theory and Applications of Matrix Completion in Genomics Datasets
title Theory and Applications of Matrix Completion in Genomics Datasets
title_full Theory and Applications of Matrix Completion in Genomics Datasets
title_fullStr Theory and Applications of Matrix Completion in Genomics Datasets
title_full_unstemmed Theory and Applications of Matrix Completion in Genomics Datasets
title_short Theory and Applications of Matrix Completion in Genomics Datasets
title_sort theory and applications of matrix completion in genomics datasets
url https://hdl.handle.net/1721.1/144547
work_keys_str_mv AT stefanakisgeorge theoryandapplicationsofmatrixcompletioningenomicsdatasets