A comparison of graph- and kernel-based –omics data integration algorithms for classifying complex traits

Abstract Background High-throughput sequencing data are widely collected and analyzed in the study of complex diseases in quest of improving human health. Well-studied algorithms mostly deal with single data source, and cannot fully utilize the potential of these multi-omics data sources. In order t...

Full description

Bibliographic Details
Main Authors: Kang K. Yan, Hongyu Zhao, Herbert Pang
Format: Article
Language:English
Published: BMC 2017-12-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-017-1982-4
_version_ 1819012554010132480
author Kang K. Yan
Hongyu Zhao
Herbert Pang
author_facet Kang K. Yan
Hongyu Zhao
Herbert Pang
author_sort Kang K. Yan
collection DOAJ
description Abstract Background High-throughput sequencing data are widely collected and analyzed in the study of complex diseases in quest of improving human health. Well-studied algorithms mostly deal with single data source, and cannot fully utilize the potential of these multi-omics data sources. In order to provide a holistic understanding of human health and diseases, it is necessary to integrate multiple data sources. Several algorithms have been proposed so far, however, a comprehensive comparison of data integration algorithms for classification of binary traits is currently lacking. Results In this paper, we focus on two common classes of integration algorithms, graph-based that depict relationships with subjects denoted by nodes and relationships denoted by edges, and kernel-based that can generate a classifier in feature space. Our paper provides a comprehensive comparison of their performance in terms of various measurements of classification accuracy and computation time. Seven different integration algorithms, including graph-based semi-supervised learning, graph sharpening integration, composite association network, Bayesian network, semi-definite programming-support vector machine (SDP-SVM), relevance vector machine (RVM) and Ada-boost relevance vector machine are compared and evaluated with hypertension and two cancer data sets in our study. In general, kernel-based algorithms create more complex models and require longer computation time, but they tend to perform better than graph-based algorithms. The performance of graph-based algorithms has the advantage of being faster computationally. Conclusions The empirical results demonstrate that composite association network, relevance vector machine, and Ada-boost RVM are the better performers. We provide recommendations on how to choose an appropriate algorithm for integrating data from multiple sources.
first_indexed 2024-12-21T01:45:53Z
format Article
id doaj.art-daa846269b1a4d6c83a374ff9031a82a
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-21T01:45:53Z
publishDate 2017-12-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-daa846269b1a4d6c83a374ff9031a82a2022-12-21T19:20:02ZengBMCBMC Bioinformatics1471-21052017-12-0118111310.1186/s12859-017-1982-4A comparison of graph- and kernel-based –omics data integration algorithms for classifying complex traitsKang K. Yan0Hongyu Zhao1Herbert Pang2School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong KongDepartment of Biostatistics, Yale UniversitySchool of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong KongAbstract Background High-throughput sequencing data are widely collected and analyzed in the study of complex diseases in quest of improving human health. Well-studied algorithms mostly deal with single data source, and cannot fully utilize the potential of these multi-omics data sources. In order to provide a holistic understanding of human health and diseases, it is necessary to integrate multiple data sources. Several algorithms have been proposed so far, however, a comprehensive comparison of data integration algorithms for classification of binary traits is currently lacking. Results In this paper, we focus on two common classes of integration algorithms, graph-based that depict relationships with subjects denoted by nodes and relationships denoted by edges, and kernel-based that can generate a classifier in feature space. Our paper provides a comprehensive comparison of their performance in terms of various measurements of classification accuracy and computation time. Seven different integration algorithms, including graph-based semi-supervised learning, graph sharpening integration, composite association network, Bayesian network, semi-definite programming-support vector machine (SDP-SVM), relevance vector machine (RVM) and Ada-boost relevance vector machine are compared and evaluated with hypertension and two cancer data sets in our study. In general, kernel-based algorithms create more complex models and require longer computation time, but they tend to perform better than graph-based algorithms. The performance of graph-based algorithms has the advantage of being faster computationally. Conclusions The empirical results demonstrate that composite association network, relevance vector machine, and Ada-boost RVM are the better performers. We provide recommendations on how to choose an appropriate algorithm for integrating data from multiple sources.http://link.springer.com/article/10.1186/s12859-017-1982-4Bayesian networkRelevance vector machineGraph-based semi-supervised learningSemi-definite programming (SDP)-support vector machineMultiple data sourcesClassification
spellingShingle Kang K. Yan
Hongyu Zhao
Herbert Pang
A comparison of graph- and kernel-based –omics data integration algorithms for classifying complex traits
BMC Bioinformatics
Bayesian network
Relevance vector machine
Graph-based semi-supervised learning
Semi-definite programming (SDP)-support vector machine
Multiple data sources
Classification
title A comparison of graph- and kernel-based –omics data integration algorithms for classifying complex traits
title_full A comparison of graph- and kernel-based –omics data integration algorithms for classifying complex traits
title_fullStr A comparison of graph- and kernel-based –omics data integration algorithms for classifying complex traits
title_full_unstemmed A comparison of graph- and kernel-based –omics data integration algorithms for classifying complex traits
title_short A comparison of graph- and kernel-based –omics data integration algorithms for classifying complex traits
title_sort comparison of graph and kernel based omics data integration algorithms for classifying complex traits
topic Bayesian network
Relevance vector machine
Graph-based semi-supervised learning
Semi-definite programming (SDP)-support vector machine
Multiple data sources
Classification
url http://link.springer.com/article/10.1186/s12859-017-1982-4
work_keys_str_mv AT kangkyan acomparisonofgraphandkernelbasedomicsdataintegrationalgorithmsforclassifyingcomplextraits
AT hongyuzhao acomparisonofgraphandkernelbasedomicsdataintegrationalgorithmsforclassifyingcomplextraits
AT herbertpang acomparisonofgraphandkernelbasedomicsdataintegrationalgorithmsforclassifyingcomplextraits
AT kangkyan comparisonofgraphandkernelbasedomicsdataintegrationalgorithmsforclassifyingcomplextraits
AT hongyuzhao comparisonofgraphandkernelbasedomicsdataintegrationalgorithmsforclassifyingcomplextraits
AT herbertpang comparisonofgraphandkernelbasedomicsdataintegrationalgorithmsforclassifyingcomplextraits