Ensemble methods of rank-based trees for single sample classification with gene expression profiles

Abstract Building Single Sample Predictors (SSPs) from gene expression profiles presents challenges, notably due to the lack of calibration across diverse gene expression measurement technologies. However, recent research indicates the viability of classifying phenotypes based on the order of expres...

Full description

Bibliographic Details
Main Authors: Min Lu, Ruijie Yin, X. Steven Chen
Format: Article
Language:English
Published: BMC 2024-02-01
Series:Journal of Translational Medicine
Subjects:
Online Access:https://doi.org/10.1186/s12967-024-04940-2
_version_ 1827326367912427520
author Min Lu
Ruijie Yin
X. Steven Chen
author_facet Min Lu
Ruijie Yin
X. Steven Chen
author_sort Min Lu
collection DOAJ
description Abstract Building Single Sample Predictors (SSPs) from gene expression profiles presents challenges, notably due to the lack of calibration across diverse gene expression measurement technologies. However, recent research indicates the viability of classifying phenotypes based on the order of expression of multiple genes. Existing SSP methods often rely on Top Scoring Pairs (TSP), which are platform-independent and easy to interpret through the concept of “relative expression reversals”. Nevertheless, TSP methods face limitations in classifying complex patterns involving comparisons of more than two gene expressions. To overcome these constraints, we introduce a novel approach that extends TSP rules by constructing rank-based trees capable of encompassing extensive gene-gene comparisons. This method is bolstered by incorporating two ensemble strategies, boosting and random forest, to mitigate the risk of overfitting. Our implementation of ensemble rank-based trees employs boosting with LogitBoost cost and random forests, addressing both binary and multi-class classification problems. In a comparative analysis across 12 cancer gene expression datasets, our proposed methods demonstrate superior performance over both the k-TSP classifier and nearest template prediction methods. We have further refined our approach to facilitate variable selection and the generation of clear, precise decision rules from rank-based trees, enhancing interpretability. The cumulative evidence from our research underscores the significant potential of ensemble rank-based trees in advancing disease classification via gene expression data, offering a robust, interpretable, and scalable solution. Our software is available at https://CRAN.R-project.org/package=ranktreeEnsemble .
first_indexed 2024-03-07T14:43:26Z
format Article
id doaj.art-2cf82e1fceca49da9711527fee41003a
institution Directory Open Access Journal
issn 1479-5876
language English
last_indexed 2024-03-07T14:43:26Z
publishDate 2024-02-01
publisher BMC
record_format Article
series Journal of Translational Medicine
spelling doaj.art-2cf82e1fceca49da9711527fee41003a2024-03-05T20:07:10ZengBMCJournal of Translational Medicine1479-58762024-02-0122111310.1186/s12967-024-04940-2Ensemble methods of rank-based trees for single sample classification with gene expression profilesMin Lu0Ruijie Yin1X. Steven Chen2Division of Biostatistics, Department of Public Health Sciences, Miller School of Medicine, University of MiamiDivision of Biostatistics, Department of Public Health Sciences, Miller School of Medicine, University of MiamiDivision of Biostatistics, Department of Public Health Sciences, Miller School of Medicine, University of MiamiAbstract Building Single Sample Predictors (SSPs) from gene expression profiles presents challenges, notably due to the lack of calibration across diverse gene expression measurement technologies. However, recent research indicates the viability of classifying phenotypes based on the order of expression of multiple genes. Existing SSP methods often rely on Top Scoring Pairs (TSP), which are platform-independent and easy to interpret through the concept of “relative expression reversals”. Nevertheless, TSP methods face limitations in classifying complex patterns involving comparisons of more than two gene expressions. To overcome these constraints, we introduce a novel approach that extends TSP rules by constructing rank-based trees capable of encompassing extensive gene-gene comparisons. This method is bolstered by incorporating two ensemble strategies, boosting and random forest, to mitigate the risk of overfitting. Our implementation of ensemble rank-based trees employs boosting with LogitBoost cost and random forests, addressing both binary and multi-class classification problems. In a comparative analysis across 12 cancer gene expression datasets, our proposed methods demonstrate superior performance over both the k-TSP classifier and nearest template prediction methods. We have further refined our approach to facilitate variable selection and the generation of clear, precise decision rules from rank-based trees, enhancing interpretability. The cumulative evidence from our research underscores the significant potential of ensemble rank-based trees in advancing disease classification via gene expression data, offering a robust, interpretable, and scalable solution. Our software is available at https://CRAN.R-project.org/package=ranktreeEnsemble .https://doi.org/10.1186/s12967-024-04940-2Single sample predictorDecision treeRank discriminantEnsemble learningBoostingRandom forest
spellingShingle Min Lu
Ruijie Yin
X. Steven Chen
Ensemble methods of rank-based trees for single sample classification with gene expression profiles
Journal of Translational Medicine
Single sample predictor
Decision tree
Rank discriminant
Ensemble learning
Boosting
Random forest
title Ensemble methods of rank-based trees for single sample classification with gene expression profiles
title_full Ensemble methods of rank-based trees for single sample classification with gene expression profiles
title_fullStr Ensemble methods of rank-based trees for single sample classification with gene expression profiles
title_full_unstemmed Ensemble methods of rank-based trees for single sample classification with gene expression profiles
title_short Ensemble methods of rank-based trees for single sample classification with gene expression profiles
title_sort ensemble methods of rank based trees for single sample classification with gene expression profiles
topic Single sample predictor
Decision tree
Rank discriminant
Ensemble learning
Boosting
Random forest
url https://doi.org/10.1186/s12967-024-04940-2
work_keys_str_mv AT minlu ensemblemethodsofrankbasedtreesforsinglesampleclassificationwithgeneexpressionprofiles
AT ruijieyin ensemblemethodsofrankbasedtreesforsinglesampleclassificationwithgeneexpressionprofiles
AT xstevenchen ensemblemethodsofrankbasedtreesforsinglesampleclassificationwithgeneexpressionprofiles