A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression Data

DNA methylation change has been useful for cancer biomarker discovery, classification, and potential treatment development. So far, existing methods use either differentially methylated CpG sites or combined CpG sites, namely differentially methylated regions, that can be mapped to genes. However, s...

Full description

Bibliographic Details
Main Authors: Saurav Mallik, Soumita Seth, Tapas Bhadra, Zhongming Zhao
Format: Article
Language:English
Published: MDPI AG 2020-08-01
Series:Genes
Subjects:
Online Access:https://www.mdpi.com/2073-4425/11/8/931
_version_ 1797558515879903232
author Saurav Mallik
Soumita Seth
Tapas Bhadra
Zhongming Zhao
author_facet Saurav Mallik
Soumita Seth
Tapas Bhadra
Zhongming Zhao
author_sort Saurav Mallik
collection DOAJ
description DNA methylation change has been useful for cancer biomarker discovery, classification, and potential treatment development. So far, existing methods use either differentially methylated CpG sites or combined CpG sites, namely differentially methylated regions, that can be mapped to genes. However, such methylation signal mapping has limitations. To address these limitations, in this study, we introduced a combinatorial framework using linear regression, differential expression, deep learning method for accurate biological interpretation of DNA methylation through integrating DNA methylation data and corresponding TCGA gene expression data. We demonstrated it for uterine cervical cancer. First, we pre-filtered outliers from the data set and then determined the predicted gene expression value from the pre-filtered methylation data through linear regression. We identified differentially expressed genes (DEGs) by Empirical Bayes test using <inline-formula><math display="inline"><semantics><mi mathvariant="italic">Limma</mi></semantics></math></inline-formula>. Then we applied a deep learning method, “<i>nnet</i>” to classify the cervical cancer label of those DEGs to determine all classification metrics including accuracy and area under curve (AUC) through 10-fold cross validation. We applied our approach to uterine cervical cancer DNA methylation dataset (NCBI accession ID: GSE30760, 27,578 features covering 63 tumor and 152 matched normal samples). After linear regression and differential expression analysis, we obtained 6287 DEGs with false discovery rate (FDR) <inline-formula><math display="inline"><semantics><mrow><mo><</mo><mn>0.001</mn></mrow></semantics></math></inline-formula>. After performing deep learning analysis, we obtained average classification accuracy <inline-formula><math display="inline"><semantics><mrow><mn>90.69</mn><mo>%</mo></mrow></semantics></math></inline-formula> (<inline-formula><math display="inline"><semantics><mrow><mo>±</mo><mn>1.97</mn><mo>%</mo></mrow></semantics></math></inline-formula>) of the uterine cervical cancerous labels. This performance is better than that of other peer methods. We performed in-degree and out-degree hub gene network analysis using <inline-formula><math display="inline"><semantics><mi mathvariant="italic">Cytoscape</mi></semantics></math></inline-formula>. We reported five top in-degree genes (<inline-formula><math display="inline"><semantics><mrow><mi mathvariant="italic">PAIP</mi><mn mathvariant="italic">2</mn></mrow></semantics></math></inline-formula>, <inline-formula><math display="inline"><semantics><mrow><mi mathvariant="italic">GRWD</mi><mn mathvariant="italic">1</mn></mrow></semantics></math></inline-formula>, <inline-formula><math display="inline"><semantics><mrow><mi mathvariant="italic">VPS</mi><mn mathvariant="italic">4</mn><mi mathvariant="italic">B</mi></mrow></semantics></math></inline-formula>, <inline-formula><math display="inline"><semantics><mi mathvariant="italic">CRADD</mi></semantics></math></inline-formula> and <inline-formula><math display="inline"><semantics><mi mathvariant="italic">LLPH</mi></semantics></math></inline-formula>) and five top out-degree genes (<inline-formula><math display="inline"><semantics><mrow><mi mathvariant="italic">MRPL</mi><mn mathvariant="italic">35</mn></mrow></semantics></math></inline-formula>, <inline-formula><math display="inline"><semantics><mrow><mi mathvariant="italic">FAM</mi><mn mathvariant="italic">177</mn><mi mathvariant="italic">A</mi><mn mathvariant="italic">1</mn></mrow></semantics></math></inline-formula>, <inline-formula><math display="inline"><semantics><mrow><mi mathvariant="italic">STAT</mi><mn mathvariant="italic">4</mn></mrow></semantics></math></inline-formula>, <inline-formula><math display="inline"><semantics><mrow><mi mathvariant="italic">ASPSCR</mi><mn mathvariant="italic">1</mn></mrow></semantics></math></inline-formula> and <inline-formula><math display="inline"><semantics><mrow><mi mathvariant="italic">FABP</mi><mn mathvariant="italic">7</mn></mrow></semantics></math></inline-formula>). After that, we performed KEGG pathway and Gene Ontology enrichment analysis of DEGs using tool <i>WebGestalt(WEB-based Gene SeT AnaLysis Toolkit)</i>. In summary, our proposed framework that integrated linear regression, differential expression, deep learning provides a robust approach to better interpret DNA methylation analysis and gene expression data in disease study.
first_indexed 2024-03-10T17:32:27Z
format Article
id doaj.art-4b77398677524306acb8a8f6a154e908
institution Directory Open Access Journal
issn 2073-4425
language English
last_indexed 2024-03-10T17:32:27Z
publishDate 2020-08-01
publisher MDPI AG
record_format Article
series Genes
spelling doaj.art-4b77398677524306acb8a8f6a154e9082023-11-20T09:58:08ZengMDPI AGGenes2073-44252020-08-0111893110.3390/genes11080931A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression DataSaurav Mallik0Soumita Seth1Tapas Bhadra2Zhongming Zhao3Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USADepartment of Computer Science & Engineering, Aliah University, Newtown WB-700160, IndiaDepartment of Computer Science & Engineering, Aliah University, Newtown WB-700160, IndiaCenter for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USADNA methylation change has been useful for cancer biomarker discovery, classification, and potential treatment development. So far, existing methods use either differentially methylated CpG sites or combined CpG sites, namely differentially methylated regions, that can be mapped to genes. However, such methylation signal mapping has limitations. To address these limitations, in this study, we introduced a combinatorial framework using linear regression, differential expression, deep learning method for accurate biological interpretation of DNA methylation through integrating DNA methylation data and corresponding TCGA gene expression data. We demonstrated it for uterine cervical cancer. First, we pre-filtered outliers from the data set and then determined the predicted gene expression value from the pre-filtered methylation data through linear regression. We identified differentially expressed genes (DEGs) by Empirical Bayes test using <inline-formula><math display="inline"><semantics><mi mathvariant="italic">Limma</mi></semantics></math></inline-formula>. Then we applied a deep learning method, “<i>nnet</i>” to classify the cervical cancer label of those DEGs to determine all classification metrics including accuracy and area under curve (AUC) through 10-fold cross validation. We applied our approach to uterine cervical cancer DNA methylation dataset (NCBI accession ID: GSE30760, 27,578 features covering 63 tumor and 152 matched normal samples). After linear regression and differential expression analysis, we obtained 6287 DEGs with false discovery rate (FDR) <inline-formula><math display="inline"><semantics><mrow><mo><</mo><mn>0.001</mn></mrow></semantics></math></inline-formula>. After performing deep learning analysis, we obtained average classification accuracy <inline-formula><math display="inline"><semantics><mrow><mn>90.69</mn><mo>%</mo></mrow></semantics></math></inline-formula> (<inline-formula><math display="inline"><semantics><mrow><mo>±</mo><mn>1.97</mn><mo>%</mo></mrow></semantics></math></inline-formula>) of the uterine cervical cancerous labels. This performance is better than that of other peer methods. We performed in-degree and out-degree hub gene network analysis using <inline-formula><math display="inline"><semantics><mi mathvariant="italic">Cytoscape</mi></semantics></math></inline-formula>. We reported five top in-degree genes (<inline-formula><math display="inline"><semantics><mrow><mi mathvariant="italic">PAIP</mi><mn mathvariant="italic">2</mn></mrow></semantics></math></inline-formula>, <inline-formula><math display="inline"><semantics><mrow><mi mathvariant="italic">GRWD</mi><mn mathvariant="italic">1</mn></mrow></semantics></math></inline-formula>, <inline-formula><math display="inline"><semantics><mrow><mi mathvariant="italic">VPS</mi><mn mathvariant="italic">4</mn><mi mathvariant="italic">B</mi></mrow></semantics></math></inline-formula>, <inline-formula><math display="inline"><semantics><mi mathvariant="italic">CRADD</mi></semantics></math></inline-formula> and <inline-formula><math display="inline"><semantics><mi mathvariant="italic">LLPH</mi></semantics></math></inline-formula>) and five top out-degree genes (<inline-formula><math display="inline"><semantics><mrow><mi mathvariant="italic">MRPL</mi><mn mathvariant="italic">35</mn></mrow></semantics></math></inline-formula>, <inline-formula><math display="inline"><semantics><mrow><mi mathvariant="italic">FAM</mi><mn mathvariant="italic">177</mn><mi mathvariant="italic">A</mi><mn mathvariant="italic">1</mn></mrow></semantics></math></inline-formula>, <inline-formula><math display="inline"><semantics><mrow><mi mathvariant="italic">STAT</mi><mn mathvariant="italic">4</mn></mrow></semantics></math></inline-formula>, <inline-formula><math display="inline"><semantics><mrow><mi mathvariant="italic">ASPSCR</mi><mn mathvariant="italic">1</mn></mrow></semantics></math></inline-formula> and <inline-formula><math display="inline"><semantics><mrow><mi mathvariant="italic">FABP</mi><mn mathvariant="italic">7</mn></mrow></semantics></math></inline-formula>). After that, we performed KEGG pathway and Gene Ontology enrichment analysis of DEGs using tool <i>WebGestalt(WEB-based Gene SeT AnaLysis Toolkit)</i>. In summary, our proposed framework that integrated linear regression, differential expression, deep learning provides a robust approach to better interpret DNA methylation analysis and gene expression data in disease study.https://www.mdpi.com/2073-4425/11/8/931uterine cervical cancerDNA methylationLiner regressiondeep learningdifferentially expressed genes
spellingShingle Saurav Mallik
Soumita Seth
Tapas Bhadra
Zhongming Zhao
A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression Data
Genes
uterine cervical cancer
DNA methylation
Liner regression
deep learning
differentially expressed genes
title A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression Data
title_full A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression Data
title_fullStr A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression Data
title_full_unstemmed A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression Data
title_short A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression Data
title_sort linear regression and deep learning approach for detecting reliable genetic alterations in cancer using dna methylation and gene expression data
topic uterine cervical cancer
DNA methylation
Liner regression
deep learning
differentially expressed genes
url https://www.mdpi.com/2073-4425/11/8/931
work_keys_str_mv AT sauravmallik alinearregressionanddeeplearningapproachfordetectingreliablegeneticalterationsincancerusingdnamethylationandgeneexpressiondata
AT soumitaseth alinearregressionanddeeplearningapproachfordetectingreliablegeneticalterationsincancerusingdnamethylationandgeneexpressiondata
AT tapasbhadra alinearregressionanddeeplearningapproachfordetectingreliablegeneticalterationsincancerusingdnamethylationandgeneexpressiondata
AT zhongmingzhao alinearregressionanddeeplearningapproachfordetectingreliablegeneticalterationsincancerusingdnamethylationandgeneexpressiondata
AT sauravmallik linearregressionanddeeplearningapproachfordetectingreliablegeneticalterationsincancerusingdnamethylationandgeneexpressiondata
AT soumitaseth linearregressionanddeeplearningapproachfordetectingreliablegeneticalterationsincancerusingdnamethylationandgeneexpressiondata
AT tapasbhadra linearregressionanddeeplearningapproachfordetectingreliablegeneticalterationsincancerusingdnamethylationandgeneexpressiondata
AT zhongmingzhao linearregressionanddeeplearningapproachfordetectingreliablegeneticalterationsincancerusingdnamethylationandgeneexpressiondata