Gene expression prediction using low-rank matrix completion
Background An exponential growth of high-throughput biological information and data has occurred in the past decade, supported by technologies, such as microarrays and RNA-Seq. Most data generated using such methods are used to encode large amounts of rich information, and determine diagnostic and...
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
BioMed Central
2016
|
Online Access: | http://hdl.handle.net/1721.1/104048 https://orcid.org/0000-0002-5952-9844 |
_version_ | 1811079663647195136 |
---|---|
author | Kapur, Arnav Marwah, Kshitij Alterovitz, Gil |
author2 | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory |
author_facet | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Kapur, Arnav Marwah, Kshitij Alterovitz, Gil |
author_sort | Kapur, Arnav |
collection | MIT |
description | Background
An exponential growth of high-throughput biological information and data has occurred in the past decade, supported by technologies, such as microarrays and RNA-Seq. Most data generated using such methods are used to encode large amounts of rich information, and determine diagnostic and prognostic biomarkers. Although data storage costs have reduced, process of capturing data using aforementioned technologies is still expensive. Moreover, the time required for the assay, from sample preparation to raw value measurement is excessive (in the order of days). There is an opportunity to reduce both the cost and time for generating such expression datasets.
Results
We propose a framework in which complete gene expression values can be reliably predicted in-silico from partial measurements. This is achieved by modelling expression data as a low-rank matrix and then applying recently discovered techniques of matrix completion by using nonlinear convex optimisation. We evaluated prediction of gene expression data based on 133 studies, sourced from a combined total of 10,921 samples. It is shown that such datasets can be constructed with a low relative error even at high missing value rates (>50 %), and that such predicted datasets can be reliably used as surrogates for further analysis.
Conclusion
This method has potentially far-reaching applications including how bio-medical data is sourced and generated, and transcriptomic prediction by optimisation. We show that gene expression data can be computationally constructed, thereby potentially reducing the costs of gene expression profiling. In conclusion, this method shows great promise of opening new avenues in research on low-rank matrix completion in biological sciences. |
first_indexed | 2024-09-23T11:18:42Z |
format | Article |
id | mit-1721.1/104048 |
institution | Massachusetts Institute of Technology |
language | English |
last_indexed | 2024-09-23T11:18:42Z |
publishDate | 2016 |
publisher | BioMed Central |
record_format | dspace |
spelling | mit-1721.1/1040482022-10-01T02:45:07Z Gene expression prediction using low-rank matrix completion Kapur, Arnav Marwah, Kshitij Alterovitz, Gil Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Alterovitz, Gil Background An exponential growth of high-throughput biological information and data has occurred in the past decade, supported by technologies, such as microarrays and RNA-Seq. Most data generated using such methods are used to encode large amounts of rich information, and determine diagnostic and prognostic biomarkers. Although data storage costs have reduced, process of capturing data using aforementioned technologies is still expensive. Moreover, the time required for the assay, from sample preparation to raw value measurement is excessive (in the order of days). There is an opportunity to reduce both the cost and time for generating such expression datasets. Results We propose a framework in which complete gene expression values can be reliably predicted in-silico from partial measurements. This is achieved by modelling expression data as a low-rank matrix and then applying recently discovered techniques of matrix completion by using nonlinear convex optimisation. We evaluated prediction of gene expression data based on 133 studies, sourced from a combined total of 10,921 samples. It is shown that such datasets can be constructed with a low relative error even at high missing value rates (>50 %), and that such predicted datasets can be reliably used as surrogates for further analysis. Conclusion This method has potentially far-reaching applications including how bio-medical data is sourced and generated, and transcriptomic prediction by optimisation. We show that gene expression data can be computationally constructed, thereby potentially reducing the costs of gene expression profiling. In conclusion, this method shows great promise of opening new avenues in research on low-rank matrix completion in biological sciences. 2016-08-26T18:57:23Z 2016-08-26T18:57:23Z 2016-06 2015-11 2016-08-03T08:14:04Z Article http://purl.org/eprint/type/JournalArticle 1471-2105 http://hdl.handle.net/1721.1/104048 Kapur, Arnav, Kshitij Marwah, and Gil Alterovitz. “Gene Expression Prediction Using Low-Rank Matrix Completion.” BMC Bioinformatics 17.1 (2016): n. pag. https://orcid.org/0000-0002-5952-9844 en http://dx.doi.org/10.1186/s12859-016-1106-6 BMC Bioinformatics Creative Commons Attribution http://creativecommons.org/licenses/by/4.0/ Kapur et al. application/pdf BioMed Central BioMed Central |
spellingShingle | Kapur, Arnav Marwah, Kshitij Alterovitz, Gil Gene expression prediction using low-rank matrix completion |
title | Gene expression prediction using low-rank matrix completion |
title_full | Gene expression prediction using low-rank matrix completion |
title_fullStr | Gene expression prediction using low-rank matrix completion |
title_full_unstemmed | Gene expression prediction using low-rank matrix completion |
title_short | Gene expression prediction using low-rank matrix completion |
title_sort | gene expression prediction using low rank matrix completion |
url | http://hdl.handle.net/1721.1/104048 https://orcid.org/0000-0002-5952-9844 |
work_keys_str_mv | AT kapurarnav geneexpressionpredictionusinglowrankmatrixcompletion AT marwahkshitij geneexpressionpredictionusinglowrankmatrixcompletion AT alterovitzgil geneexpressionpredictionusinglowrankmatrixcompletion |