Deep learning approach for cancer subtype classification using high-dimensional gene expression data

Abstract Motivation Studies have shown that classifying cancer subtypes can provide valuable information for a range of cancer research, from aetiology and tumour biology to prognosis and personalized treatment. Current methods usually adopt gene expression data to perform cancer subtype classificat...

Full description

Bibliographic Details
Main Authors: Jiquan Shen, Jiawei Shi, Junwei Luo, Haixia Zhai, Xiaoyan Liu, Zhengjiang Wu, Chaokun Yan, Huimin Luo
Format: Article
Language:English
Published: BMC 2022-10-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-022-04980-9
_version_ 1811250038140043264
author Jiquan Shen
Jiawei Shi
Junwei Luo
Haixia Zhai
Xiaoyan Liu
Zhengjiang Wu
Chaokun Yan
Huimin Luo
author_facet Jiquan Shen
Jiawei Shi
Junwei Luo
Haixia Zhai
Xiaoyan Liu
Zhengjiang Wu
Chaokun Yan
Huimin Luo
author_sort Jiquan Shen
collection DOAJ
description Abstract Motivation Studies have shown that classifying cancer subtypes can provide valuable information for a range of cancer research, from aetiology and tumour biology to prognosis and personalized treatment. Current methods usually adopt gene expression data to perform cancer subtype classification. However, cancer samples are scarce, and the high-dimensional features of their gene expression data are too sparse to allow most methods to achieve desirable classification results. Results In this paper, we propose a deep learning approach by combining a convolutional neural network (CNN) and bidirectional gated recurrent unit (BiGRU): our approach, DCGN, aims to achieve nonlinear dimensionality reduction and learn features to eliminate irrelevant factors in gene expression data. Specifically, DCGN first uses the synthetic minority oversampling technique algorithm to equalize data. The CNN can handle high-dimensional data without stress and extract important local features, and the BiGRU can analyse deep features and retain their important information; the DCGN captures key features by combining both neural networks to overcome the challenges of small sample sizes and sparse, high-dimensional features. In the experiments, we compared the DCGN to seven other cancer subtype classification methods using breast and bladder cancer gene expression datasets. The experimental results show that the DCGN performs better than the other seven methods and can provide more satisfactory classification results.
first_indexed 2024-04-12T15:58:11Z
format Article
id doaj.art-5564e7ab413d4eb899bb87a30a201721
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-12T15:58:11Z
publishDate 2022-10-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-5564e7ab413d4eb899bb87a30a2017212022-12-22T03:26:18ZengBMCBMC Bioinformatics1471-21052022-10-0123111710.1186/s12859-022-04980-9Deep learning approach for cancer subtype classification using high-dimensional gene expression dataJiquan Shen0Jiawei Shi1Junwei Luo2Haixia Zhai3Xiaoyan Liu4Zhengjiang Wu5Chaokun Yan6Huimin Luo7School of Software, Henan Polytechnic UniversitySchool of Software, Henan Polytechnic UniversitySchool of Software, Henan Polytechnic UniversitySchool of Software, Henan Polytechnic UniversitySchool of Software, Henan Polytechnic UniversitySchool of Software, Henan Polytechnic UniversitySchool of Computer and Information Engineering, Henan UniversitySchool of Computer and Information Engineering, Henan UniversityAbstract Motivation Studies have shown that classifying cancer subtypes can provide valuable information for a range of cancer research, from aetiology and tumour biology to prognosis and personalized treatment. Current methods usually adopt gene expression data to perform cancer subtype classification. However, cancer samples are scarce, and the high-dimensional features of their gene expression data are too sparse to allow most methods to achieve desirable classification results. Results In this paper, we propose a deep learning approach by combining a convolutional neural network (CNN) and bidirectional gated recurrent unit (BiGRU): our approach, DCGN, aims to achieve nonlinear dimensionality reduction and learn features to eliminate irrelevant factors in gene expression data. Specifically, DCGN first uses the synthetic minority oversampling technique algorithm to equalize data. The CNN can handle high-dimensional data without stress and extract important local features, and the BiGRU can analyse deep features and retain their important information; the DCGN captures key features by combining both neural networks to overcome the challenges of small sample sizes and sparse, high-dimensional features. In the experiments, we compared the DCGN to seven other cancer subtype classification methods using breast and bladder cancer gene expression datasets. The experimental results show that the DCGN performs better than the other seven methods and can provide more satisfactory classification results.https://doi.org/10.1186/s12859-022-04980-9Cancer subtypeClassificationDeep learning
spellingShingle Jiquan Shen
Jiawei Shi
Junwei Luo
Haixia Zhai
Xiaoyan Liu
Zhengjiang Wu
Chaokun Yan
Huimin Luo
Deep learning approach for cancer subtype classification using high-dimensional gene expression data
BMC Bioinformatics
Cancer subtype
Classification
Deep learning
title Deep learning approach for cancer subtype classification using high-dimensional gene expression data
title_full Deep learning approach for cancer subtype classification using high-dimensional gene expression data
title_fullStr Deep learning approach for cancer subtype classification using high-dimensional gene expression data
title_full_unstemmed Deep learning approach for cancer subtype classification using high-dimensional gene expression data
title_short Deep learning approach for cancer subtype classification using high-dimensional gene expression data
title_sort deep learning approach for cancer subtype classification using high dimensional gene expression data
topic Cancer subtype
Classification
Deep learning
url https://doi.org/10.1186/s12859-022-04980-9
work_keys_str_mv AT jiquanshen deeplearningapproachforcancersubtypeclassificationusinghighdimensionalgeneexpressiondata
AT jiaweishi deeplearningapproachforcancersubtypeclassificationusinghighdimensionalgeneexpressiondata
AT junweiluo deeplearningapproachforcancersubtypeclassificationusinghighdimensionalgeneexpressiondata
AT haixiazhai deeplearningapproachforcancersubtypeclassificationusinghighdimensionalgeneexpressiondata
AT xiaoyanliu deeplearningapproachforcancersubtypeclassificationusinghighdimensionalgeneexpressiondata
AT zhengjiangwu deeplearningapproachforcancersubtypeclassificationusinghighdimensionalgeneexpressiondata
AT chaokunyan deeplearningapproachforcancersubtypeclassificationusinghighdimensionalgeneexpressiondata
AT huiminluo deeplearningapproachforcancersubtypeclassificationusinghighdimensionalgeneexpressiondata