Multiple-kernel learning for genomic data mining and prediction

Abstract Background Advances in medical technology have allowed for customized prognosis, diagnosis, and treatment regimens that utilize multiple heterogeneous data sources. Multiple kernel learning (MKL) is well suited for the integration of multiple high throughput data sources. MKL remains to be...

Full description

Bibliographic Details
Main Authors: Christopher M. Wilson, Kaiqiao Li, Xiaoqing Yu, Pei-Fen Kuan, Xuefeng Wang
Format: Article
Language:English
Published: BMC 2019-08-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-019-2992-1
_version_ 1818882719802720256
author Christopher M. Wilson
Kaiqiao Li
Xiaoqing Yu
Pei-Fen Kuan
Xuefeng Wang
author_facet Christopher M. Wilson
Kaiqiao Li
Xiaoqing Yu
Pei-Fen Kuan
Xuefeng Wang
author_sort Christopher M. Wilson
collection DOAJ
description Abstract Background Advances in medical technology have allowed for customized prognosis, diagnosis, and treatment regimens that utilize multiple heterogeneous data sources. Multiple kernel learning (MKL) is well suited for the integration of multiple high throughput data sources. MKL remains to be under-utilized by genomic researchers partly due to the lack of unified guidelines for its use, and benchmark genomic datasets. Results We provide three implementations of MKL in R. These methods are applied to simulated data to illustrate that MKL can select appropriate models. We also apply MKL to combine clinical information with miRNA gene expression data of ovarian cancer study into a single analysis. Lastly, we show that MKL can identify gene sets that are known to play a role in the prognostic prediction of 15 cancer types using gene expression data from The Cancer Genome Atlas, as well as, identify new gene sets for the future research. Conclusion Multiple kernel learning coupled with modern optimization techniques provides a promising learning tool for building predictive models based on multi-source genomic data. MKL also provides an automated scheme for kernel prioritization and parameter tuning. The methods used in the paper are implemented as an R package called RMKL package, which is freely available for download through CRAN at https://CRAN.R-project.org/package=RMKL.
first_indexed 2024-12-19T15:22:14Z
format Article
id doaj.art-34f2d3acce89480fa1aaa76c0bab2e70
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-19T15:22:14Z
publishDate 2019-08-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-34f2d3acce89480fa1aaa76c0bab2e702022-12-21T20:15:58ZengBMCBMC Bioinformatics1471-21052019-08-012011710.1186/s12859-019-2992-1Multiple-kernel learning for genomic data mining and predictionChristopher M. Wilson0Kaiqiao Li1Xiaoqing Yu2Pei-Fen Kuan3Xuefeng Wang4Department of Biostatistics and Bioinformatics at Moffitt Cancer CenterDepartment of Applied Mathematics and Statistics at Stony Brook UniversityDepartment of Biostatistics and Bioinformatics at Moffitt Cancer CenterDepartment of Applied Mathematics and Statistics at Stony Brook UniversityDepartment of Biostatistics and Bioinformatics at Moffitt Cancer CenterAbstract Background Advances in medical technology have allowed for customized prognosis, diagnosis, and treatment regimens that utilize multiple heterogeneous data sources. Multiple kernel learning (MKL) is well suited for the integration of multiple high throughput data sources. MKL remains to be under-utilized by genomic researchers partly due to the lack of unified guidelines for its use, and benchmark genomic datasets. Results We provide three implementations of MKL in R. These methods are applied to simulated data to illustrate that MKL can select appropriate models. We also apply MKL to combine clinical information with miRNA gene expression data of ovarian cancer study into a single analysis. Lastly, we show that MKL can identify gene sets that are known to play a role in the prognostic prediction of 15 cancer types using gene expression data from The Cancer Genome Atlas, as well as, identify new gene sets for the future research. Conclusion Multiple kernel learning coupled with modern optimization techniques provides a promising learning tool for building predictive models based on multi-source genomic data. MKL also provides an automated scheme for kernel prioritization and parameter tuning. The methods used in the paper are implemented as an R package called RMKL package, which is freely available for download through CRAN at https://CRAN.R-project.org/package=RMKL.http://link.springer.com/article/10.1186/s12859-019-2992-1ClassificationMultiple kernel learningGenomicsData integrationMachine learningKernel methods
spellingShingle Christopher M. Wilson
Kaiqiao Li
Xiaoqing Yu
Pei-Fen Kuan
Xuefeng Wang
Multiple-kernel learning for genomic data mining and prediction
BMC Bioinformatics
Classification
Multiple kernel learning
Genomics
Data integration
Machine learning
Kernel methods
title Multiple-kernel learning for genomic data mining and prediction
title_full Multiple-kernel learning for genomic data mining and prediction
title_fullStr Multiple-kernel learning for genomic data mining and prediction
title_full_unstemmed Multiple-kernel learning for genomic data mining and prediction
title_short Multiple-kernel learning for genomic data mining and prediction
title_sort multiple kernel learning for genomic data mining and prediction
topic Classification
Multiple kernel learning
Genomics
Data integration
Machine learning
Kernel methods
url http://link.springer.com/article/10.1186/s12859-019-2992-1
work_keys_str_mv AT christophermwilson multiplekernellearningforgenomicdataminingandprediction
AT kaiqiaoli multiplekernellearningforgenomicdataminingandprediction
AT xiaoqingyu multiplekernellearningforgenomicdataminingandprediction
AT peifenkuan multiplekernellearningforgenomicdataminingandprediction
AT xuefengwang multiplekernellearningforgenomicdataminingandprediction