Cancer prognosis prediction using somatic point mutation and copy number variation data: a comparison of gene-level and pathway-based models

Abstract Background Genomic profiling of solid human tumors by projects such as The Cancer Genome Atlas (TCGA) has provided important information regarding the somatic alterations that drive cancer progression and patient survival. Although researchers have successfully leveraged TCGA data to build...

Full description

Bibliographic Details
Main Authors: Xingyu Zheng, Christopher I. Amos, H. Robert Frost
Format: Article
Language:English
Published: BMC 2020-10-01
Series:BMC Bioinformatics
Online Access:http://link.springer.com/article/10.1186/s12859-020-03791-0
_version_ 1818655263115182080
author Xingyu Zheng
Christopher I. Amos
H. Robert Frost
author_facet Xingyu Zheng
Christopher I. Amos
H. Robert Frost
author_sort Xingyu Zheng
collection DOAJ
description Abstract Background Genomic profiling of solid human tumors by projects such as The Cancer Genome Atlas (TCGA) has provided important information regarding the somatic alterations that drive cancer progression and patient survival. Although researchers have successfully leveraged TCGA data to build prognostic models, most efforts have focused on specific cancer types and a targeted set of gene-level predictors. Less is known about the prognostic ability of pathway-level variables in a pan-cancer setting. To address these limitations, we systematically evaluated and compared the prognostic ability of somatic point mutation (SPM) and copy number variation (CNV) data, gene-level and pathway-level models for a diverse set of TCGA cancer types and predictive modeling approaches. Results We evaluated gene-level and pathway-level penalized Cox proportional hazards models using SPM and CNV data for 29 different TCGA cohorts. We measured predictive accuracy as the concordance index for predicting survival outcomes. Our comprehensive analysis suggests that the use of pathway-level predictors did not offer superior predictive power relative to gene-level models for all cancer types but had the advantages of robustness and parsimony. We identified a set of cohorts for which somatic alterations could not predict prognosis, and a unique cohort LGG, for which SPM data was more predictive than CNV data and the predictive accuracy is good for all model types. We found that the pathway-level predictors provide superior interpretative value and that there is often a serious collinearity issue for the gene-level models while pathway-level models avoided this issue. Conclusion Our comprehensive analysis suggests that when using somatic alterations data for cancer prognosis prediction, pathway-level models are more interpretable, stable and parsimonious compared to gene-level models. Pathway-level models also avoid the issue of collinearity, which can be serious for gene-level somatic alterations. The prognostic power of somatic alterations is highly variable across different cancer types and we have identified a set of cohorts for which somatic alterations could not predict prognosis. In general, CNV data predicts prognosis better than SPM data with the exception of the LGG cohort.
first_indexed 2024-12-17T03:06:54Z
format Article
id doaj.art-cd0b387808cf444cb1b53f6e1eeed2ee
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-17T03:06:54Z
publishDate 2020-10-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-cd0b387808cf444cb1b53f6e1eeed2ee2022-12-21T22:05:55ZengBMCBMC Bioinformatics1471-21052020-10-0121111910.1186/s12859-020-03791-0Cancer prognosis prediction using somatic point mutation and copy number variation data: a comparison of gene-level and pathway-based modelsXingyu Zheng0Christopher I. Amos1H. Robert Frost2Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth CollegeDepartment of Biomedical Data Science, Geisel School of Medicine, Dartmouth CollegeDepartment of Biomedical Data Science, Geisel School of Medicine, Dartmouth CollegeAbstract Background Genomic profiling of solid human tumors by projects such as The Cancer Genome Atlas (TCGA) has provided important information regarding the somatic alterations that drive cancer progression and patient survival. Although researchers have successfully leveraged TCGA data to build prognostic models, most efforts have focused on specific cancer types and a targeted set of gene-level predictors. Less is known about the prognostic ability of pathway-level variables in a pan-cancer setting. To address these limitations, we systematically evaluated and compared the prognostic ability of somatic point mutation (SPM) and copy number variation (CNV) data, gene-level and pathway-level models for a diverse set of TCGA cancer types and predictive modeling approaches. Results We evaluated gene-level and pathway-level penalized Cox proportional hazards models using SPM and CNV data for 29 different TCGA cohorts. We measured predictive accuracy as the concordance index for predicting survival outcomes. Our comprehensive analysis suggests that the use of pathway-level predictors did not offer superior predictive power relative to gene-level models for all cancer types but had the advantages of robustness and parsimony. We identified a set of cohorts for which somatic alterations could not predict prognosis, and a unique cohort LGG, for which SPM data was more predictive than CNV data and the predictive accuracy is good for all model types. We found that the pathway-level predictors provide superior interpretative value and that there is often a serious collinearity issue for the gene-level models while pathway-level models avoided this issue. Conclusion Our comprehensive analysis suggests that when using somatic alterations data for cancer prognosis prediction, pathway-level models are more interpretable, stable and parsimonious compared to gene-level models. Pathway-level models also avoid the issue of collinearity, which can be serious for gene-level somatic alterations. The prognostic power of somatic alterations is highly variable across different cancer types and we have identified a set of cohorts for which somatic alterations could not predict prognosis. In general, CNV data predicts prognosis better than SPM data with the exception of the LGG cohort.http://link.springer.com/article/10.1186/s12859-020-03791-0
spellingShingle Xingyu Zheng
Christopher I. Amos
H. Robert Frost
Cancer prognosis prediction using somatic point mutation and copy number variation data: a comparison of gene-level and pathway-based models
BMC Bioinformatics
title Cancer prognosis prediction using somatic point mutation and copy number variation data: a comparison of gene-level and pathway-based models
title_full Cancer prognosis prediction using somatic point mutation and copy number variation data: a comparison of gene-level and pathway-based models
title_fullStr Cancer prognosis prediction using somatic point mutation and copy number variation data: a comparison of gene-level and pathway-based models
title_full_unstemmed Cancer prognosis prediction using somatic point mutation and copy number variation data: a comparison of gene-level and pathway-based models
title_short Cancer prognosis prediction using somatic point mutation and copy number variation data: a comparison of gene-level and pathway-based models
title_sort cancer prognosis prediction using somatic point mutation and copy number variation data a comparison of gene level and pathway based models
url http://link.springer.com/article/10.1186/s12859-020-03791-0
work_keys_str_mv AT xingyuzheng cancerprognosispredictionusingsomaticpointmutationandcopynumbervariationdataacomparisonofgenelevelandpathwaybasedmodels
AT christopheriamos cancerprognosispredictionusingsomaticpointmutationandcopynumbervariationdataacomparisonofgenelevelandpathwaybasedmodels
AT hrobertfrost cancerprognosispredictionusingsomaticpointmutationandcopynumbervariationdataacomparisonofgenelevelandpathwaybasedmodels