Colorectal Cancer Prediction Based on Weighted Gene Co-Expression Network Analysis and Variational Auto-Encoder

An effective feature extraction method is key to improving the accuracy of a prediction model. From the Gene Expression Omnibus (GEO) database, which includes 13,487 genes, we obtained microarray gene expression data for 238 samples from colorectal cancer (CRC) samples and normal samples. Twelve gen...

Full description

Bibliographic Details
Main Authors: Dongmei Ai, Yuduo Wang, Xiaoxin Li, Hongfei Pan
Format: Article
Language:English
Published: MDPI AG 2020-08-01
Series:Biomolecules
Subjects:
Online Access:https://www.mdpi.com/2218-273X/10/9/1207
_version_ 1797556835455074304
author Dongmei Ai
Yuduo Wang
Xiaoxin Li
Hongfei Pan
author_facet Dongmei Ai
Yuduo Wang
Xiaoxin Li
Hongfei Pan
author_sort Dongmei Ai
collection DOAJ
description An effective feature extraction method is key to improving the accuracy of a prediction model. From the Gene Expression Omnibus (GEO) database, which includes 13,487 genes, we obtained microarray gene expression data for 238 samples from colorectal cancer (CRC) samples and normal samples. Twelve gene modules were obtained by weighted gene co-expression network analysis (WGCNA) on 173 samples. By calculating the Pearson correlation coefficient (PCC) between the characteristic genes of each module and colorectal cancer, we obtained a key module that was highly correlated with CRC. We screened hub genes from the key module by considering module membership, gene significance, and intramodular connectivity. We selected 10 hub genes as a type of feature for the classifier. We used the variational autoencoder (VAE) for 1159 genes with significantly different expressions and mapped the data into a 10-dimensional representation, as another type of feature for the cancer classifier. The two types of features were applied to the support vector machines (SVM) classifier for CRC. The accuracy was 0.9692 with an AUC of 0.9981. The result shows a high accuracy of the two-step feature extraction method, which includes obtaining hub genes by WGCNA and a 10-dimensional representation by variational autoencoder (VAE).
first_indexed 2024-03-10T17:08:36Z
format Article
id doaj.art-3c47c79bfe844ca7b1c3c6011313a1ea
institution Directory Open Access Journal
issn 2218-273X
language English
last_indexed 2024-03-10T17:08:36Z
publishDate 2020-08-01
publisher MDPI AG
record_format Article
series Biomolecules
spelling doaj.art-3c47c79bfe844ca7b1c3c6011313a1ea2023-11-20T10:43:41ZengMDPI AGBiomolecules2218-273X2020-08-01109120710.3390/biom10091207Colorectal Cancer Prediction Based on Weighted Gene Co-Expression Network Analysis and Variational Auto-EncoderDongmei Ai0Yuduo Wang1Xiaoxin Li2Hongfei Pan3Basic Experimental Center of Natural Science, University of Science and Technology Beijing, Beijing 100083, ChinaSchool of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, ChinaSchool of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, ChinaSchool of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, ChinaAn effective feature extraction method is key to improving the accuracy of a prediction model. From the Gene Expression Omnibus (GEO) database, which includes 13,487 genes, we obtained microarray gene expression data for 238 samples from colorectal cancer (CRC) samples and normal samples. Twelve gene modules were obtained by weighted gene co-expression network analysis (WGCNA) on 173 samples. By calculating the Pearson correlation coefficient (PCC) between the characteristic genes of each module and colorectal cancer, we obtained a key module that was highly correlated with CRC. We screened hub genes from the key module by considering module membership, gene significance, and intramodular connectivity. We selected 10 hub genes as a type of feature for the classifier. We used the variational autoencoder (VAE) for 1159 genes with significantly different expressions and mapped the data into a 10-dimensional representation, as another type of feature for the cancer classifier. The two types of features were applied to the support vector machines (SVM) classifier for CRC. The accuracy was 0.9692 with an AUC of 0.9981. The result shows a high accuracy of the two-step feature extraction method, which includes obtaining hub genes by WGCNA and a 10-dimensional representation by variational autoencoder (VAE).https://www.mdpi.com/2218-273X/10/9/1207weighted gene co-expression network analysisvariational autoencodercolorectal cancerhub genesclassifier
spellingShingle Dongmei Ai
Yuduo Wang
Xiaoxin Li
Hongfei Pan
Colorectal Cancer Prediction Based on Weighted Gene Co-Expression Network Analysis and Variational Auto-Encoder
Biomolecules
weighted gene co-expression network analysis
variational autoencoder
colorectal cancer
hub genes
classifier
title Colorectal Cancer Prediction Based on Weighted Gene Co-Expression Network Analysis and Variational Auto-Encoder
title_full Colorectal Cancer Prediction Based on Weighted Gene Co-Expression Network Analysis and Variational Auto-Encoder
title_fullStr Colorectal Cancer Prediction Based on Weighted Gene Co-Expression Network Analysis and Variational Auto-Encoder
title_full_unstemmed Colorectal Cancer Prediction Based on Weighted Gene Co-Expression Network Analysis and Variational Auto-Encoder
title_short Colorectal Cancer Prediction Based on Weighted Gene Co-Expression Network Analysis and Variational Auto-Encoder
title_sort colorectal cancer prediction based on weighted gene co expression network analysis and variational auto encoder
topic weighted gene co-expression network analysis
variational autoencoder
colorectal cancer
hub genes
classifier
url https://www.mdpi.com/2218-273X/10/9/1207
work_keys_str_mv AT dongmeiai colorectalcancerpredictionbasedonweightedgenecoexpressionnetworkanalysisandvariationalautoencoder
AT yuduowang colorectalcancerpredictionbasedonweightedgenecoexpressionnetworkanalysisandvariationalautoencoder
AT xiaoxinli colorectalcancerpredictionbasedonweightedgenecoexpressionnetworkanalysisandvariationalautoencoder
AT hongfeipan colorectalcancerpredictionbasedonweightedgenecoexpressionnetworkanalysisandvariationalautoencoder