Colorectal cancer subtype identification from differential gene expression levels using minimalist deep learning

Abstract Background Cancer molecular subtyping plays a critical role in individualized patient treatment. In previous studies, high-throughput gene expression signature-based methods have been proposed to identify cancer subtypes. Unfortunately, the existing ones suffer from the curse of dimensional...

Full description

Bibliographic Details
Main Authors: Shaochuan Li, Yuning Yang, Xin Wang, Jun Li, Jun Yu, Xiangtao Li, Ka-Chun Wong
Format: Article
Language:English
Published: BMC 2022-04-01
Series:BioData Mining
Subjects:
Online Access:https://doi.org/10.1186/s13040-022-00295-w
_version_ 1817989047062102016
author Shaochuan Li
Yuning Yang
Xin Wang
Jun Li
Jun Yu
Xiangtao Li
Ka-Chun Wong
author_facet Shaochuan Li
Yuning Yang
Xin Wang
Jun Li
Jun Yu
Xiangtao Li
Ka-Chun Wong
author_sort Shaochuan Li
collection DOAJ
description Abstract Background Cancer molecular subtyping plays a critical role in individualized patient treatment. In previous studies, high-throughput gene expression signature-based methods have been proposed to identify cancer subtypes. Unfortunately, the existing ones suffer from the curse of dimensionality, data sparsity, and computational deficiency. Methods To address those problems, we propose a computational framework for colorectal cancer subtyping without any exploitation in model complexity and generality. A supervised learning framework based on deep learning (DeepCSD) is proposed to identify cancer subtypes. Specifically, based on the differentially expressed genes under cancer consensus molecular subtyping, we design a minimalist feed-forward neural network to capture the distinct molecular features in different cancer subtypes. To mitigate the overfitting phenomenon of deep learning as much as possible, L 1 and L 2 regularization and dropout layers are added. Results For demonstrating the effectiveness of DeepCSD, we compared it with other methods including Random Forest (RF), Deep forest (gcForest), support vector machine (SVM), XGBoost, and DeepCC on eight independent colorectal cancer datasets. The results reflect that DeepCSD can achieve superior performance over other algorithms. In addition, gene ontology enrichment and pathology analysis are conducted to reveal novel insights into the cancer subtype identification and characterization mechanisms. Conclusions DeepCSD considers all subtype-specific genes as input, which is pathologically necessary for its completeness. At the same time, DeepCSD shows remarkable robustness in handling cross-platform gene expression data, achieving similar performance on both training and test data without significant model overfitting or exploitation of model complexity.
first_indexed 2024-04-14T00:41:23Z
format Article
id doaj.art-3d80c10bfc744d13b2d8a0324707908d
institution Directory Open Access Journal
issn 1756-0381
language English
last_indexed 2024-04-14T00:41:23Z
publishDate 2022-04-01
publisher BMC
record_format Article
series BioData Mining
spelling doaj.art-3d80c10bfc744d13b2d8a0324707908d2022-12-22T02:22:10ZengBMCBioData Mining1756-03812022-04-0115111610.1186/s13040-022-00295-wColorectal cancer subtype identification from differential gene expression levels using minimalist deep learningShaochuan Li0Yuning Yang1Xin Wang2Jun Li3Jun Yu4Xiangtao Li5Ka-Chun Wong6Department of Information Science and Technology, Northeast Normal UniversityDepartment of Information Science and Technology, Northeast Normal UniversityDepartment of Surgery, Chinese University of Hong KongDepartment of Infectious Diseases and Public Health, Jockey Club College of Veterinary Medicine and Life Sciences, and School of Data Science, City University of Hong KongInstitute of Digestive Disease and Department of Medicine and Therapeutics, State Key Laboratory of Digestive Disease, Li Ka Shing Institute of Health Sciences, The Chinese University of Hong KongSchool of Artificial Intelligence, Jilin UniversityDepartment of Computer Science, City University of Hong KongAbstract Background Cancer molecular subtyping plays a critical role in individualized patient treatment. In previous studies, high-throughput gene expression signature-based methods have been proposed to identify cancer subtypes. Unfortunately, the existing ones suffer from the curse of dimensionality, data sparsity, and computational deficiency. Methods To address those problems, we propose a computational framework for colorectal cancer subtyping without any exploitation in model complexity and generality. A supervised learning framework based on deep learning (DeepCSD) is proposed to identify cancer subtypes. Specifically, based on the differentially expressed genes under cancer consensus molecular subtyping, we design a minimalist feed-forward neural network to capture the distinct molecular features in different cancer subtypes. To mitigate the overfitting phenomenon of deep learning as much as possible, L 1 and L 2 regularization and dropout layers are added. Results For demonstrating the effectiveness of DeepCSD, we compared it with other methods including Random Forest (RF), Deep forest (gcForest), support vector machine (SVM), XGBoost, and DeepCC on eight independent colorectal cancer datasets. The results reflect that DeepCSD can achieve superior performance over other algorithms. In addition, gene ontology enrichment and pathology analysis are conducted to reveal novel insights into the cancer subtype identification and characterization mechanisms. Conclusions DeepCSD considers all subtype-specific genes as input, which is pathologically necessary for its completeness. At the same time, DeepCSD shows remarkable robustness in handling cross-platform gene expression data, achieving similar performance on both training and test data without significant model overfitting or exploitation of model complexity.https://doi.org/10.1186/s13040-022-00295-wDeepCSDCancer subtype identificationDifferential gene expression
spellingShingle Shaochuan Li
Yuning Yang
Xin Wang
Jun Li
Jun Yu
Xiangtao Li
Ka-Chun Wong
Colorectal cancer subtype identification from differential gene expression levels using minimalist deep learning
BioData Mining
DeepCSD
Cancer subtype identification
Differential gene expression
title Colorectal cancer subtype identification from differential gene expression levels using minimalist deep learning
title_full Colorectal cancer subtype identification from differential gene expression levels using minimalist deep learning
title_fullStr Colorectal cancer subtype identification from differential gene expression levels using minimalist deep learning
title_full_unstemmed Colorectal cancer subtype identification from differential gene expression levels using minimalist deep learning
title_short Colorectal cancer subtype identification from differential gene expression levels using minimalist deep learning
title_sort colorectal cancer subtype identification from differential gene expression levels using minimalist deep learning
topic DeepCSD
Cancer subtype identification
Differential gene expression
url https://doi.org/10.1186/s13040-022-00295-w
work_keys_str_mv AT shaochuanli colorectalcancersubtypeidentificationfromdifferentialgeneexpressionlevelsusingminimalistdeeplearning
AT yuningyang colorectalcancersubtypeidentificationfromdifferentialgeneexpressionlevelsusingminimalistdeeplearning
AT xinwang colorectalcancersubtypeidentificationfromdifferentialgeneexpressionlevelsusingminimalistdeeplearning
AT junli colorectalcancersubtypeidentificationfromdifferentialgeneexpressionlevelsusingminimalistdeeplearning
AT junyu colorectalcancersubtypeidentificationfromdifferentialgeneexpressionlevelsusingminimalistdeeplearning
AT xiangtaoli colorectalcancersubtypeidentificationfromdifferentialgeneexpressionlevelsusingminimalistdeeplearning
AT kachunwong colorectalcancersubtypeidentificationfromdifferentialgeneexpressionlevelsusingminimalistdeeplearning