Cancer prognosis prediction using somatic point mutation and copy number variation data: a comparison of gene-level and pathway-based models
Abstract Background Genomic profiling of solid human tumors by projects such as The Cancer Genome Atlas (TCGA) has provided important information regarding the somatic alterations that drive cancer progression and patient survival. Although researchers have successfully leveraged TCGA data to build...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2020-10-01
|
Series: | BMC Bioinformatics |
Online Access: | http://link.springer.com/article/10.1186/s12859-020-03791-0 |
_version_ | 1818655263115182080 |
---|---|
author | Xingyu Zheng Christopher I. Amos H. Robert Frost |
author_facet | Xingyu Zheng Christopher I. Amos H. Robert Frost |
author_sort | Xingyu Zheng |
collection | DOAJ |
description | Abstract Background Genomic profiling of solid human tumors by projects such as The Cancer Genome Atlas (TCGA) has provided important information regarding the somatic alterations that drive cancer progression and patient survival. Although researchers have successfully leveraged TCGA data to build prognostic models, most efforts have focused on specific cancer types and a targeted set of gene-level predictors. Less is known about the prognostic ability of pathway-level variables in a pan-cancer setting. To address these limitations, we systematically evaluated and compared the prognostic ability of somatic point mutation (SPM) and copy number variation (CNV) data, gene-level and pathway-level models for a diverse set of TCGA cancer types and predictive modeling approaches. Results We evaluated gene-level and pathway-level penalized Cox proportional hazards models using SPM and CNV data for 29 different TCGA cohorts. We measured predictive accuracy as the concordance index for predicting survival outcomes. Our comprehensive analysis suggests that the use of pathway-level predictors did not offer superior predictive power relative to gene-level models for all cancer types but had the advantages of robustness and parsimony. We identified a set of cohorts for which somatic alterations could not predict prognosis, and a unique cohort LGG, for which SPM data was more predictive than CNV data and the predictive accuracy is good for all model types. We found that the pathway-level predictors provide superior interpretative value and that there is often a serious collinearity issue for the gene-level models while pathway-level models avoided this issue. Conclusion Our comprehensive analysis suggests that when using somatic alterations data for cancer prognosis prediction, pathway-level models are more interpretable, stable and parsimonious compared to gene-level models. Pathway-level models also avoid the issue of collinearity, which can be serious for gene-level somatic alterations. The prognostic power of somatic alterations is highly variable across different cancer types and we have identified a set of cohorts for which somatic alterations could not predict prognosis. In general, CNV data predicts prognosis better than SPM data with the exception of the LGG cohort. |
first_indexed | 2024-12-17T03:06:54Z |
format | Article |
id | doaj.art-cd0b387808cf444cb1b53f6e1eeed2ee |
institution | Directory Open Access Journal |
issn | 1471-2105 |
language | English |
last_indexed | 2024-12-17T03:06:54Z |
publishDate | 2020-10-01 |
publisher | BMC |
record_format | Article |
series | BMC Bioinformatics |
spelling | doaj.art-cd0b387808cf444cb1b53f6e1eeed2ee2022-12-21T22:05:55ZengBMCBMC Bioinformatics1471-21052020-10-0121111910.1186/s12859-020-03791-0Cancer prognosis prediction using somatic point mutation and copy number variation data: a comparison of gene-level and pathway-based modelsXingyu Zheng0Christopher I. Amos1H. Robert Frost2Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth CollegeDepartment of Biomedical Data Science, Geisel School of Medicine, Dartmouth CollegeDepartment of Biomedical Data Science, Geisel School of Medicine, Dartmouth CollegeAbstract Background Genomic profiling of solid human tumors by projects such as The Cancer Genome Atlas (TCGA) has provided important information regarding the somatic alterations that drive cancer progression and patient survival. Although researchers have successfully leveraged TCGA data to build prognostic models, most efforts have focused on specific cancer types and a targeted set of gene-level predictors. Less is known about the prognostic ability of pathway-level variables in a pan-cancer setting. To address these limitations, we systematically evaluated and compared the prognostic ability of somatic point mutation (SPM) and copy number variation (CNV) data, gene-level and pathway-level models for a diverse set of TCGA cancer types and predictive modeling approaches. Results We evaluated gene-level and pathway-level penalized Cox proportional hazards models using SPM and CNV data for 29 different TCGA cohorts. We measured predictive accuracy as the concordance index for predicting survival outcomes. Our comprehensive analysis suggests that the use of pathway-level predictors did not offer superior predictive power relative to gene-level models for all cancer types but had the advantages of robustness and parsimony. We identified a set of cohorts for which somatic alterations could not predict prognosis, and a unique cohort LGG, for which SPM data was more predictive than CNV data and the predictive accuracy is good for all model types. We found that the pathway-level predictors provide superior interpretative value and that there is often a serious collinearity issue for the gene-level models while pathway-level models avoided this issue. Conclusion Our comprehensive analysis suggests that when using somatic alterations data for cancer prognosis prediction, pathway-level models are more interpretable, stable and parsimonious compared to gene-level models. Pathway-level models also avoid the issue of collinearity, which can be serious for gene-level somatic alterations. The prognostic power of somatic alterations is highly variable across different cancer types and we have identified a set of cohorts for which somatic alterations could not predict prognosis. In general, CNV data predicts prognosis better than SPM data with the exception of the LGG cohort.http://link.springer.com/article/10.1186/s12859-020-03791-0 |
spellingShingle | Xingyu Zheng Christopher I. Amos H. Robert Frost Cancer prognosis prediction using somatic point mutation and copy number variation data: a comparison of gene-level and pathway-based models BMC Bioinformatics |
title | Cancer prognosis prediction using somatic point mutation and copy number variation data: a comparison of gene-level and pathway-based models |
title_full | Cancer prognosis prediction using somatic point mutation and copy number variation data: a comparison of gene-level and pathway-based models |
title_fullStr | Cancer prognosis prediction using somatic point mutation and copy number variation data: a comparison of gene-level and pathway-based models |
title_full_unstemmed | Cancer prognosis prediction using somatic point mutation and copy number variation data: a comparison of gene-level and pathway-based models |
title_short | Cancer prognosis prediction using somatic point mutation and copy number variation data: a comparison of gene-level and pathway-based models |
title_sort | cancer prognosis prediction using somatic point mutation and copy number variation data a comparison of gene level and pathway based models |
url | http://link.springer.com/article/10.1186/s12859-020-03791-0 |
work_keys_str_mv | AT xingyuzheng cancerprognosispredictionusingsomaticpointmutationandcopynumbervariationdataacomparisonofgenelevelandpathwaybasedmodels AT christopheriamos cancerprognosispredictionusingsomaticpointmutationandcopynumbervariationdataacomparisonofgenelevelandpathwaybasedmodels AT hrobertfrost cancerprognosispredictionusingsomaticpointmutationandcopynumbervariationdataacomparisonofgenelevelandpathwaybasedmodels |