Theory and Applications of Matrix Completion in Genomics Datasets
The advent of rapid and efficient biological screening and sequencing technologies has enabled high-throughput data collection, opening the door to improvements in drug discovery, disease identification, and personalized medicine, among others. The size and scope of such datasets is unprecedented, a...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2022
|
Online Access: | https://hdl.handle.net/1721.1/144547 |
_version_ | 1826212793433456640 |
---|---|
author | Stefanakis, George |
author2 | Uhler, Caroline |
author_facet | Uhler, Caroline Stefanakis, George |
author_sort | Stefanakis, George |
collection | MIT |
description | The advent of rapid and efficient biological screening and sequencing technologies has enabled high-throughput data collection, opening the door to improvements in drug discovery, disease identification, and personalized medicine, among others. The size and scope of such datasets is unprecedented, and their increased availability over the past decade, in conjunction with rapid advancements in statistical inference and machine learning, has paved the way for an explosion in research. Still, many problems in this space are yet-unexplored or still in their infancy, either due to data availability or lack of computationally efficient or high-accuracy methods for modeling and prediction. In this work, we develop theory and demonstrate empirical results for use of the novel Neural Tangent Kernel (NTK) in matrix completion. We derive the functional form of the NTK for a single-hidden-layer, infinite-width neural network with ReLU activation, and develop a framework applying the NTK to matrix completion. We explore a specific application of this framework, using the Connectivity Map dataset of gene expression data for various cells and perturbations, demonstrating competitive results as compared to other methods. Additionally, we analyze our contributions through the auxiliary lens of performance engineering and develop concrete algorithms for accurate, performant, and intuitive biological imputation. |
first_indexed | 2024-09-23T15:38:19Z |
format | Thesis |
id | mit-1721.1/144547 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T15:38:19Z |
publishDate | 2022 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1445472022-08-30T03:09:13Z Theory and Applications of Matrix Completion in Genomics Datasets Stefanakis, George Uhler, Caroline Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science The advent of rapid and efficient biological screening and sequencing technologies has enabled high-throughput data collection, opening the door to improvements in drug discovery, disease identification, and personalized medicine, among others. The size and scope of such datasets is unprecedented, and their increased availability over the past decade, in conjunction with rapid advancements in statistical inference and machine learning, has paved the way for an explosion in research. Still, many problems in this space are yet-unexplored or still in their infancy, either due to data availability or lack of computationally efficient or high-accuracy methods for modeling and prediction. In this work, we develop theory and demonstrate empirical results for use of the novel Neural Tangent Kernel (NTK) in matrix completion. We derive the functional form of the NTK for a single-hidden-layer, infinite-width neural network with ReLU activation, and develop a framework applying the NTK to matrix completion. We explore a specific application of this framework, using the Connectivity Map dataset of gene expression data for various cells and perturbations, demonstrating competitive results as compared to other methods. Additionally, we analyze our contributions through the auxiliary lens of performance engineering and develop concrete algorithms for accurate, performant, and intuitive biological imputation. M.Eng. 2022-08-29T15:55:03Z 2022-08-29T15:55:03Z 2022-05 2022-05-27T16:18:46.843Z Thesis https://hdl.handle.net/1721.1/144547 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Stefanakis, George Theory and Applications of Matrix Completion in Genomics Datasets |
title | Theory and Applications of Matrix Completion in Genomics Datasets |
title_full | Theory and Applications of Matrix Completion in Genomics Datasets |
title_fullStr | Theory and Applications of Matrix Completion in Genomics Datasets |
title_full_unstemmed | Theory and Applications of Matrix Completion in Genomics Datasets |
title_short | Theory and Applications of Matrix Completion in Genomics Datasets |
title_sort | theory and applications of matrix completion in genomics datasets |
url | https://hdl.handle.net/1721.1/144547 |
work_keys_str_mv | AT stefanakisgeorge theoryandapplicationsofmatrixcompletioningenomicsdatasets |