SurviveAI: Long Term Survival Prediction of Cancer Patients Based on Somatic RNA-Seq Expression
Motivation: Prediction of cancer outcome is a major challenge in oncology and is essential for treatment planning. Repositories such as The Cancer Genome Atlas (TCGA) contain vast amounts of data for many types of cancers. Our goal was to create reliable prediction models using TCGA data and validat...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SAGE Publishing
2022-10-01
|
Series: | Cancer Informatics |
Online Access: | https://doi.org/10.1177/11769351221127875 |
_version_ | 1828001595276656640 |
---|---|
author | Omri Nayshool Nitzan Kol Elisheva Javaski Ninette Amariglio Gideon Rechavi |
author_facet | Omri Nayshool Nitzan Kol Elisheva Javaski Ninette Amariglio Gideon Rechavi |
author_sort | Omri Nayshool |
collection | DOAJ |
description | Motivation: Prediction of cancer outcome is a major challenge in oncology and is essential for treatment planning. Repositories such as The Cancer Genome Atlas (TCGA) contain vast amounts of data for many types of cancers. Our goal was to create reliable prediction models using TCGA data and validate them using an external dataset. Results: For 16 TCGA cancer type cohorts we have optimized a Random Forest prediction model using parameter grid search followed by a backward feature elimination loop for dimensions reduction. For each feature that was removed, the model was retrained and the area under the curve of the receiver operating characteristic (AUC-ROC) was calculated using test data. Five prediction models gave AUC-ROC bigger than 80%. We used Clinical Proteomic Tumor Analysis Consortium v3 (CPTAC3) data for validation. The most enriched pathways for the top models were those involved in basic functions related to tumorigenesis and organ development. Enrichment for 2 prediction models of the TCGA-KIRP cohort was explored, one with 42 genes (AUC-ROC = 0.86) the other is composed of 300 genes (AUC-ROC = 0.85). The most enriched networks for both models share only 5 network nodes: DMBT1, IL11, HOXB6, TRIB3, PIM1. These genes play a significant role in renal cancer and might be used for prognosis prediction and as candidate therapeutic targets. Availability And Implementation: The prediction models were created and tested using Python SciKit-Learn package. They are freely accessible via a friendly web interface we called surviveAI at https://tinyurl.com/surviveai . |
first_indexed | 2024-04-10T06:31:29Z |
format | Article |
id | doaj.art-0b31211aa33c481c889e776282ad6f2b |
institution | Directory Open Access Journal |
issn | 1176-9351 |
language | English |
last_indexed | 2024-04-10T06:31:29Z |
publishDate | 2022-10-01 |
publisher | SAGE Publishing |
record_format | Article |
series | Cancer Informatics |
spelling | doaj.art-0b31211aa33c481c889e776282ad6f2b2023-03-01T06:33:11ZengSAGE PublishingCancer Informatics1176-93512022-10-012110.1177/11769351221127875SurviveAI: Long Term Survival Prediction of Cancer Patients Based on Somatic RNA-Seq ExpressionOmri Nayshool0Nitzan Kol1Elisheva Javaski2Ninette Amariglio3Gideon Rechavi4Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, IsraelBioinformatics Unit, Sheba Cancer Research Center and Wohl Institute for Translational Medicine, Sheba Medical Center, Tel HaShomer, IsraelBioinformatics Unit, Sheba Cancer Research Center and Wohl Institute for Translational Medicine, Sheba Medical Center, Tel HaShomer, IsraelBioinformatics Unit, Sheba Cancer Research Center and Wohl Institute for Translational Medicine, Sheba Medical Center, Tel HaShomer, IsraelHuman Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, IsraelMotivation: Prediction of cancer outcome is a major challenge in oncology and is essential for treatment planning. Repositories such as The Cancer Genome Atlas (TCGA) contain vast amounts of data for many types of cancers. Our goal was to create reliable prediction models using TCGA data and validate them using an external dataset. Results: For 16 TCGA cancer type cohorts we have optimized a Random Forest prediction model using parameter grid search followed by a backward feature elimination loop for dimensions reduction. For each feature that was removed, the model was retrained and the area under the curve of the receiver operating characteristic (AUC-ROC) was calculated using test data. Five prediction models gave AUC-ROC bigger than 80%. We used Clinical Proteomic Tumor Analysis Consortium v3 (CPTAC3) data for validation. The most enriched pathways for the top models were those involved in basic functions related to tumorigenesis and organ development. Enrichment for 2 prediction models of the TCGA-KIRP cohort was explored, one with 42 genes (AUC-ROC = 0.86) the other is composed of 300 genes (AUC-ROC = 0.85). The most enriched networks for both models share only 5 network nodes: DMBT1, IL11, HOXB6, TRIB3, PIM1. These genes play a significant role in renal cancer and might be used for prognosis prediction and as candidate therapeutic targets. Availability And Implementation: The prediction models were created and tested using Python SciKit-Learn package. They are freely accessible via a friendly web interface we called surviveAI at https://tinyurl.com/surviveai .https://doi.org/10.1177/11769351221127875 |
spellingShingle | Omri Nayshool Nitzan Kol Elisheva Javaski Ninette Amariglio Gideon Rechavi SurviveAI: Long Term Survival Prediction of Cancer Patients Based on Somatic RNA-Seq Expression Cancer Informatics |
title | SurviveAI: Long Term Survival Prediction of Cancer Patients Based on Somatic RNA-Seq Expression |
title_full | SurviveAI: Long Term Survival Prediction of Cancer Patients Based on Somatic RNA-Seq Expression |
title_fullStr | SurviveAI: Long Term Survival Prediction of Cancer Patients Based on Somatic RNA-Seq Expression |
title_full_unstemmed | SurviveAI: Long Term Survival Prediction of Cancer Patients Based on Somatic RNA-Seq Expression |
title_short | SurviveAI: Long Term Survival Prediction of Cancer Patients Based on Somatic RNA-Seq Expression |
title_sort | surviveai long term survival prediction of cancer patients based on somatic rna seq expression |
url | https://doi.org/10.1177/11769351221127875 |
work_keys_str_mv | AT omrinayshool surviveailongtermsurvivalpredictionofcancerpatientsbasedonsomaticrnaseqexpression AT nitzankol surviveailongtermsurvivalpredictionofcancerpatientsbasedonsomaticrnaseqexpression AT elishevajavaski surviveailongtermsurvivalpredictionofcancerpatientsbasedonsomaticrnaseqexpression AT ninetteamariglio surviveailongtermsurvivalpredictionofcancerpatientsbasedonsomaticrnaseqexpression AT gideonrechavi surviveailongtermsurvivalpredictionofcancerpatientsbasedonsomaticrnaseqexpression |