Colorectal cancer subtype identification from differential gene expression levels using minimalist deep learning
Abstract Background Cancer molecular subtyping plays a critical role in individualized patient treatment. In previous studies, high-throughput gene expression signature-based methods have been proposed to identify cancer subtypes. Unfortunately, the existing ones suffer from the curse of dimensional...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2022-04-01
|
Series: | BioData Mining |
Subjects: | |
Online Access: | https://doi.org/10.1186/s13040-022-00295-w |
_version_ | 1817989047062102016 |
---|---|
author | Shaochuan Li Yuning Yang Xin Wang Jun Li Jun Yu Xiangtao Li Ka-Chun Wong |
author_facet | Shaochuan Li Yuning Yang Xin Wang Jun Li Jun Yu Xiangtao Li Ka-Chun Wong |
author_sort | Shaochuan Li |
collection | DOAJ |
description | Abstract Background Cancer molecular subtyping plays a critical role in individualized patient treatment. In previous studies, high-throughput gene expression signature-based methods have been proposed to identify cancer subtypes. Unfortunately, the existing ones suffer from the curse of dimensionality, data sparsity, and computational deficiency. Methods To address those problems, we propose a computational framework for colorectal cancer subtyping without any exploitation in model complexity and generality. A supervised learning framework based on deep learning (DeepCSD) is proposed to identify cancer subtypes. Specifically, based on the differentially expressed genes under cancer consensus molecular subtyping, we design a minimalist feed-forward neural network to capture the distinct molecular features in different cancer subtypes. To mitigate the overfitting phenomenon of deep learning as much as possible, L 1 and L 2 regularization and dropout layers are added. Results For demonstrating the effectiveness of DeepCSD, we compared it with other methods including Random Forest (RF), Deep forest (gcForest), support vector machine (SVM), XGBoost, and DeepCC on eight independent colorectal cancer datasets. The results reflect that DeepCSD can achieve superior performance over other algorithms. In addition, gene ontology enrichment and pathology analysis are conducted to reveal novel insights into the cancer subtype identification and characterization mechanisms. Conclusions DeepCSD considers all subtype-specific genes as input, which is pathologically necessary for its completeness. At the same time, DeepCSD shows remarkable robustness in handling cross-platform gene expression data, achieving similar performance on both training and test data without significant model overfitting or exploitation of model complexity. |
first_indexed | 2024-04-14T00:41:23Z |
format | Article |
id | doaj.art-3d80c10bfc744d13b2d8a0324707908d |
institution | Directory Open Access Journal |
issn | 1756-0381 |
language | English |
last_indexed | 2024-04-14T00:41:23Z |
publishDate | 2022-04-01 |
publisher | BMC |
record_format | Article |
series | BioData Mining |
spelling | doaj.art-3d80c10bfc744d13b2d8a0324707908d2022-12-22T02:22:10ZengBMCBioData Mining1756-03812022-04-0115111610.1186/s13040-022-00295-wColorectal cancer subtype identification from differential gene expression levels using minimalist deep learningShaochuan Li0Yuning Yang1Xin Wang2Jun Li3Jun Yu4Xiangtao Li5Ka-Chun Wong6Department of Information Science and Technology, Northeast Normal UniversityDepartment of Information Science and Technology, Northeast Normal UniversityDepartment of Surgery, Chinese University of Hong KongDepartment of Infectious Diseases and Public Health, Jockey Club College of Veterinary Medicine and Life Sciences, and School of Data Science, City University of Hong KongInstitute of Digestive Disease and Department of Medicine and Therapeutics, State Key Laboratory of Digestive Disease, Li Ka Shing Institute of Health Sciences, The Chinese University of Hong KongSchool of Artificial Intelligence, Jilin UniversityDepartment of Computer Science, City University of Hong KongAbstract Background Cancer molecular subtyping plays a critical role in individualized patient treatment. In previous studies, high-throughput gene expression signature-based methods have been proposed to identify cancer subtypes. Unfortunately, the existing ones suffer from the curse of dimensionality, data sparsity, and computational deficiency. Methods To address those problems, we propose a computational framework for colorectal cancer subtyping without any exploitation in model complexity and generality. A supervised learning framework based on deep learning (DeepCSD) is proposed to identify cancer subtypes. Specifically, based on the differentially expressed genes under cancer consensus molecular subtyping, we design a minimalist feed-forward neural network to capture the distinct molecular features in different cancer subtypes. To mitigate the overfitting phenomenon of deep learning as much as possible, L 1 and L 2 regularization and dropout layers are added. Results For demonstrating the effectiveness of DeepCSD, we compared it with other methods including Random Forest (RF), Deep forest (gcForest), support vector machine (SVM), XGBoost, and DeepCC on eight independent colorectal cancer datasets. The results reflect that DeepCSD can achieve superior performance over other algorithms. In addition, gene ontology enrichment and pathology analysis are conducted to reveal novel insights into the cancer subtype identification and characterization mechanisms. Conclusions DeepCSD considers all subtype-specific genes as input, which is pathologically necessary for its completeness. At the same time, DeepCSD shows remarkable robustness in handling cross-platform gene expression data, achieving similar performance on both training and test data without significant model overfitting or exploitation of model complexity.https://doi.org/10.1186/s13040-022-00295-wDeepCSDCancer subtype identificationDifferential gene expression |
spellingShingle | Shaochuan Li Yuning Yang Xin Wang Jun Li Jun Yu Xiangtao Li Ka-Chun Wong Colorectal cancer subtype identification from differential gene expression levels using minimalist deep learning BioData Mining DeepCSD Cancer subtype identification Differential gene expression |
title | Colorectal cancer subtype identification from differential gene expression levels using minimalist deep learning |
title_full | Colorectal cancer subtype identification from differential gene expression levels using minimalist deep learning |
title_fullStr | Colorectal cancer subtype identification from differential gene expression levels using minimalist deep learning |
title_full_unstemmed | Colorectal cancer subtype identification from differential gene expression levels using minimalist deep learning |
title_short | Colorectal cancer subtype identification from differential gene expression levels using minimalist deep learning |
title_sort | colorectal cancer subtype identification from differential gene expression levels using minimalist deep learning |
topic | DeepCSD Cancer subtype identification Differential gene expression |
url | https://doi.org/10.1186/s13040-022-00295-w |
work_keys_str_mv | AT shaochuanli colorectalcancersubtypeidentificationfromdifferentialgeneexpressionlevelsusingminimalistdeeplearning AT yuningyang colorectalcancersubtypeidentificationfromdifferentialgeneexpressionlevelsusingminimalistdeeplearning AT xinwang colorectalcancersubtypeidentificationfromdifferentialgeneexpressionlevelsusingminimalistdeeplearning AT junli colorectalcancersubtypeidentificationfromdifferentialgeneexpressionlevelsusingminimalistdeeplearning AT junyu colorectalcancersubtypeidentificationfromdifferentialgeneexpressionlevelsusingminimalistdeeplearning AT xiangtaoli colorectalcancersubtypeidentificationfromdifferentialgeneexpressionlevelsusingminimalistdeeplearning AT kachunwong colorectalcancersubtypeidentificationfromdifferentialgeneexpressionlevelsusingminimalistdeeplearning |