Multiple-kernel learning for genomic data mining and prediction

Abstract Background Advances in medical technology have allowed for customized prognosis, diagnosis, and treatment regimens that utilize multiple heterogeneous data sources. Multiple kernel learning (MKL) is well suited for the integration of multiple high throughput data sources. MKL remains to be...

Full description

Bibliographic Details
Main Authors:	Christopher M. Wilson, Kaiqiao Li, Xiaoqing Yu, Pei-Fen Kuan, Xuefeng Wang
Format:	Article
Language:	English
Published:	BMC 2019-08-01
Series:	BMC Bioinformatics
Subjects:	Classification Multiple kernel learning Genomics Data integration Machine learning Kernel methods
Online Access:	http://link.springer.com/article/10.1186/s12859-019-2992-1

_version_	1818882719802720256
author	Christopher M. Wilson Kaiqiao Li Xiaoqing Yu Pei-Fen Kuan Xuefeng Wang
author_facet	Christopher M. Wilson Kaiqiao Li Xiaoqing Yu Pei-Fen Kuan Xuefeng Wang
author_sort	Christopher M. Wilson
collection	DOAJ
description	Abstract Background Advances in medical technology have allowed for customized prognosis, diagnosis, and treatment regimens that utilize multiple heterogeneous data sources. Multiple kernel learning (MKL) is well suited for the integration of multiple high throughput data sources. MKL remains to be under-utilized by genomic researchers partly due to the lack of unified guidelines for its use, and benchmark genomic datasets. Results We provide three implementations of MKL in R. These methods are applied to simulated data to illustrate that MKL can select appropriate models. We also apply MKL to combine clinical information with miRNA gene expression data of ovarian cancer study into a single analysis. Lastly, we show that MKL can identify gene sets that are known to play a role in the prognostic prediction of 15 cancer types using gene expression data from The Cancer Genome Atlas, as well as, identify new gene sets for the future research. Conclusion Multiple kernel learning coupled with modern optimization techniques provides a promising learning tool for building predictive models based on multi-source genomic data. MKL also provides an automated scheme for kernel prioritization and parameter tuning. The methods used in the paper are implemented as an R package called RMKL package, which is freely available for download through CRAN at https://CRAN.R-project.org/package=RMKL.
first_indexed	2024-12-19T15:22:14Z
format	Article
id	doaj.art-34f2d3acce89480fa1aaa76c0bab2e70
institution	Directory Open Access Journal
issn	1471-2105
language	English
last_indexed	2024-12-19T15:22:14Z
publishDate	2019-08-01
publisher	BMC
record_format	Article
series	BMC Bioinformatics
spelling	doaj.art-34f2d3acce89480fa1aaa76c0bab2e702022-12-21T20:15:58ZengBMCBMC Bioinformatics1471-21052019-08-012011710.1186/s12859-019-2992-1Multiple-kernel learning for genomic data mining and predictionChristopher M. Wilson0Kaiqiao Li1Xiaoqing Yu2Pei-Fen Kuan3Xuefeng Wang4Department of Biostatistics and Bioinformatics at Moffitt Cancer CenterDepartment of Applied Mathematics and Statistics at Stony Brook UniversityDepartment of Biostatistics and Bioinformatics at Moffitt Cancer CenterDepartment of Applied Mathematics and Statistics at Stony Brook UniversityDepartment of Biostatistics and Bioinformatics at Moffitt Cancer CenterAbstract Background Advances in medical technology have allowed for customized prognosis, diagnosis, and treatment regimens that utilize multiple heterogeneous data sources. Multiple kernel learning (MKL) is well suited for the integration of multiple high throughput data sources. MKL remains to be under-utilized by genomic researchers partly due to the lack of unified guidelines for its use, and benchmark genomic datasets. Results We provide three implementations of MKL in R. These methods are applied to simulated data to illustrate that MKL can select appropriate models. We also apply MKL to combine clinical information with miRNA gene expression data of ovarian cancer study into a single analysis. Lastly, we show that MKL can identify gene sets that are known to play a role in the prognostic prediction of 15 cancer types using gene expression data from The Cancer Genome Atlas, as well as, identify new gene sets for the future research. Conclusion Multiple kernel learning coupled with modern optimization techniques provides a promising learning tool for building predictive models based on multi-source genomic data. MKL also provides an automated scheme for kernel prioritization and parameter tuning. The methods used in the paper are implemented as an R package called RMKL package, which is freely available for download through CRAN at https://CRAN.R-project.org/package=RMKL.http://link.springer.com/article/10.1186/s12859-019-2992-1ClassificationMultiple kernel learningGenomicsData integrationMachine learningKernel methods
spellingShingle	Christopher M. Wilson Kaiqiao Li Xiaoqing Yu Pei-Fen Kuan Xuefeng Wang Multiple-kernel learning for genomic data mining and prediction BMC Bioinformatics Classification Multiple kernel learning Genomics Data integration Machine learning Kernel methods
title	Multiple-kernel learning for genomic data mining and prediction
title_full	Multiple-kernel learning for genomic data mining and prediction
title_fullStr	Multiple-kernel learning for genomic data mining and prediction
title_full_unstemmed	Multiple-kernel learning for genomic data mining and prediction
title_short	Multiple-kernel learning for genomic data mining and prediction
title_sort	multiple kernel learning for genomic data mining and prediction
topic	Classification Multiple kernel learning Genomics Data integration Machine learning Kernel methods
url	http://link.springer.com/article/10.1186/s12859-019-2992-1
work_keys_str_mv	AT christophermwilson multiplekernellearningforgenomicdataminingandprediction AT kaiqiaoli multiplekernellearningforgenomicdataminingandprediction AT xiaoqingyu multiplekernellearningforgenomicdataminingandprediction AT peifenkuan multiplekernellearningforgenomicdataminingandprediction AT xuefengwang multiplekernellearningforgenomicdataminingandprediction

Multiple-kernel learning for genomic data mining and prediction

Similar Items