A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression Data
DNA methylation change has been useful for cancer biomarker discovery, classification, and potential treatment development. So far, existing methods use either differentially methylated CpG sites or combined CpG sites, namely differentially methylated regions, that can be mapped to genes. However, s...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-08-01
|
Series: | Genes |
Subjects: | |
Online Access: | https://www.mdpi.com/2073-4425/11/8/931 |
_version_ | 1797558515879903232 |
---|---|
author | Saurav Mallik Soumita Seth Tapas Bhadra Zhongming Zhao |
author_facet | Saurav Mallik Soumita Seth Tapas Bhadra Zhongming Zhao |
author_sort | Saurav Mallik |
collection | DOAJ |
description | DNA methylation change has been useful for cancer biomarker discovery, classification, and potential treatment development. So far, existing methods use either differentially methylated CpG sites or combined CpG sites, namely differentially methylated regions, that can be mapped to genes. However, such methylation signal mapping has limitations. To address these limitations, in this study, we introduced a combinatorial framework using linear regression, differential expression, deep learning method for accurate biological interpretation of DNA methylation through integrating DNA methylation data and corresponding TCGA gene expression data. We demonstrated it for uterine cervical cancer. First, we pre-filtered outliers from the data set and then determined the predicted gene expression value from the pre-filtered methylation data through linear regression. We identified differentially expressed genes (DEGs) by Empirical Bayes test using <inline-formula><math display="inline"><semantics><mi mathvariant="italic">Limma</mi></semantics></math></inline-formula>. Then we applied a deep learning method, “<i>nnet</i>” to classify the cervical cancer label of those DEGs to determine all classification metrics including accuracy and area under curve (AUC) through 10-fold cross validation. We applied our approach to uterine cervical cancer DNA methylation dataset (NCBI accession ID: GSE30760, 27,578 features covering 63 tumor and 152 matched normal samples). After linear regression and differential expression analysis, we obtained 6287 DEGs with false discovery rate (FDR) <inline-formula><math display="inline"><semantics><mrow><mo><</mo><mn>0.001</mn></mrow></semantics></math></inline-formula>. After performing deep learning analysis, we obtained average classification accuracy <inline-formula><math display="inline"><semantics><mrow><mn>90.69</mn><mo>%</mo></mrow></semantics></math></inline-formula> (<inline-formula><math display="inline"><semantics><mrow><mo>±</mo><mn>1.97</mn><mo>%</mo></mrow></semantics></math></inline-formula>) of the uterine cervical cancerous labels. This performance is better than that of other peer methods. We performed in-degree and out-degree hub gene network analysis using <inline-formula><math display="inline"><semantics><mi mathvariant="italic">Cytoscape</mi></semantics></math></inline-formula>. We reported five top in-degree genes (<inline-formula><math display="inline"><semantics><mrow><mi mathvariant="italic">PAIP</mi><mn mathvariant="italic">2</mn></mrow></semantics></math></inline-formula>, <inline-formula><math display="inline"><semantics><mrow><mi mathvariant="italic">GRWD</mi><mn mathvariant="italic">1</mn></mrow></semantics></math></inline-formula>, <inline-formula><math display="inline"><semantics><mrow><mi mathvariant="italic">VPS</mi><mn mathvariant="italic">4</mn><mi mathvariant="italic">B</mi></mrow></semantics></math></inline-formula>, <inline-formula><math display="inline"><semantics><mi mathvariant="italic">CRADD</mi></semantics></math></inline-formula> and <inline-formula><math display="inline"><semantics><mi mathvariant="italic">LLPH</mi></semantics></math></inline-formula>) and five top out-degree genes (<inline-formula><math display="inline"><semantics><mrow><mi mathvariant="italic">MRPL</mi><mn mathvariant="italic">35</mn></mrow></semantics></math></inline-formula>, <inline-formula><math display="inline"><semantics><mrow><mi mathvariant="italic">FAM</mi><mn mathvariant="italic">177</mn><mi mathvariant="italic">A</mi><mn mathvariant="italic">1</mn></mrow></semantics></math></inline-formula>, <inline-formula><math display="inline"><semantics><mrow><mi mathvariant="italic">STAT</mi><mn mathvariant="italic">4</mn></mrow></semantics></math></inline-formula>, <inline-formula><math display="inline"><semantics><mrow><mi mathvariant="italic">ASPSCR</mi><mn mathvariant="italic">1</mn></mrow></semantics></math></inline-formula> and <inline-formula><math display="inline"><semantics><mrow><mi mathvariant="italic">FABP</mi><mn mathvariant="italic">7</mn></mrow></semantics></math></inline-formula>). After that, we performed KEGG pathway and Gene Ontology enrichment analysis of DEGs using tool <i>WebGestalt(WEB-based Gene SeT AnaLysis Toolkit)</i>. In summary, our proposed framework that integrated linear regression, differential expression, deep learning provides a robust approach to better interpret DNA methylation analysis and gene expression data in disease study. |
first_indexed | 2024-03-10T17:32:27Z |
format | Article |
id | doaj.art-4b77398677524306acb8a8f6a154e908 |
institution | Directory Open Access Journal |
issn | 2073-4425 |
language | English |
last_indexed | 2024-03-10T17:32:27Z |
publishDate | 2020-08-01 |
publisher | MDPI AG |
record_format | Article |
series | Genes |
spelling | doaj.art-4b77398677524306acb8a8f6a154e9082023-11-20T09:58:08ZengMDPI AGGenes2073-44252020-08-0111893110.3390/genes11080931A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression DataSaurav Mallik0Soumita Seth1Tapas Bhadra2Zhongming Zhao3Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USADepartment of Computer Science & Engineering, Aliah University, Newtown WB-700160, IndiaDepartment of Computer Science & Engineering, Aliah University, Newtown WB-700160, IndiaCenter for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USADNA methylation change has been useful for cancer biomarker discovery, classification, and potential treatment development. So far, existing methods use either differentially methylated CpG sites or combined CpG sites, namely differentially methylated regions, that can be mapped to genes. However, such methylation signal mapping has limitations. To address these limitations, in this study, we introduced a combinatorial framework using linear regression, differential expression, deep learning method for accurate biological interpretation of DNA methylation through integrating DNA methylation data and corresponding TCGA gene expression data. We demonstrated it for uterine cervical cancer. First, we pre-filtered outliers from the data set and then determined the predicted gene expression value from the pre-filtered methylation data through linear regression. We identified differentially expressed genes (DEGs) by Empirical Bayes test using <inline-formula><math display="inline"><semantics><mi mathvariant="italic">Limma</mi></semantics></math></inline-formula>. Then we applied a deep learning method, “<i>nnet</i>” to classify the cervical cancer label of those DEGs to determine all classification metrics including accuracy and area under curve (AUC) through 10-fold cross validation. We applied our approach to uterine cervical cancer DNA methylation dataset (NCBI accession ID: GSE30760, 27,578 features covering 63 tumor and 152 matched normal samples). After linear regression and differential expression analysis, we obtained 6287 DEGs with false discovery rate (FDR) <inline-formula><math display="inline"><semantics><mrow><mo><</mo><mn>0.001</mn></mrow></semantics></math></inline-formula>. After performing deep learning analysis, we obtained average classification accuracy <inline-formula><math display="inline"><semantics><mrow><mn>90.69</mn><mo>%</mo></mrow></semantics></math></inline-formula> (<inline-formula><math display="inline"><semantics><mrow><mo>±</mo><mn>1.97</mn><mo>%</mo></mrow></semantics></math></inline-formula>) of the uterine cervical cancerous labels. This performance is better than that of other peer methods. We performed in-degree and out-degree hub gene network analysis using <inline-formula><math display="inline"><semantics><mi mathvariant="italic">Cytoscape</mi></semantics></math></inline-formula>. We reported five top in-degree genes (<inline-formula><math display="inline"><semantics><mrow><mi mathvariant="italic">PAIP</mi><mn mathvariant="italic">2</mn></mrow></semantics></math></inline-formula>, <inline-formula><math display="inline"><semantics><mrow><mi mathvariant="italic">GRWD</mi><mn mathvariant="italic">1</mn></mrow></semantics></math></inline-formula>, <inline-formula><math display="inline"><semantics><mrow><mi mathvariant="italic">VPS</mi><mn mathvariant="italic">4</mn><mi mathvariant="italic">B</mi></mrow></semantics></math></inline-formula>, <inline-formula><math display="inline"><semantics><mi mathvariant="italic">CRADD</mi></semantics></math></inline-formula> and <inline-formula><math display="inline"><semantics><mi mathvariant="italic">LLPH</mi></semantics></math></inline-formula>) and five top out-degree genes (<inline-formula><math display="inline"><semantics><mrow><mi mathvariant="italic">MRPL</mi><mn mathvariant="italic">35</mn></mrow></semantics></math></inline-formula>, <inline-formula><math display="inline"><semantics><mrow><mi mathvariant="italic">FAM</mi><mn mathvariant="italic">177</mn><mi mathvariant="italic">A</mi><mn mathvariant="italic">1</mn></mrow></semantics></math></inline-formula>, <inline-formula><math display="inline"><semantics><mrow><mi mathvariant="italic">STAT</mi><mn mathvariant="italic">4</mn></mrow></semantics></math></inline-formula>, <inline-formula><math display="inline"><semantics><mrow><mi mathvariant="italic">ASPSCR</mi><mn mathvariant="italic">1</mn></mrow></semantics></math></inline-formula> and <inline-formula><math display="inline"><semantics><mrow><mi mathvariant="italic">FABP</mi><mn mathvariant="italic">7</mn></mrow></semantics></math></inline-formula>). After that, we performed KEGG pathway and Gene Ontology enrichment analysis of DEGs using tool <i>WebGestalt(WEB-based Gene SeT AnaLysis Toolkit)</i>. In summary, our proposed framework that integrated linear regression, differential expression, deep learning provides a robust approach to better interpret DNA methylation analysis and gene expression data in disease study.https://www.mdpi.com/2073-4425/11/8/931uterine cervical cancerDNA methylationLiner regressiondeep learningdifferentially expressed genes |
spellingShingle | Saurav Mallik Soumita Seth Tapas Bhadra Zhongming Zhao A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression Data Genes uterine cervical cancer DNA methylation Liner regression deep learning differentially expressed genes |
title | A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression Data |
title_full | A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression Data |
title_fullStr | A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression Data |
title_full_unstemmed | A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression Data |
title_short | A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression Data |
title_sort | linear regression and deep learning approach for detecting reliable genetic alterations in cancer using dna methylation and gene expression data |
topic | uterine cervical cancer DNA methylation Liner regression deep learning differentially expressed genes |
url | https://www.mdpi.com/2073-4425/11/8/931 |
work_keys_str_mv | AT sauravmallik alinearregressionanddeeplearningapproachfordetectingreliablegeneticalterationsincancerusingdnamethylationandgeneexpressiondata AT soumitaseth alinearregressionanddeeplearningapproachfordetectingreliablegeneticalterationsincancerusingdnamethylationandgeneexpressiondata AT tapasbhadra alinearregressionanddeeplearningapproachfordetectingreliablegeneticalterationsincancerusingdnamethylationandgeneexpressiondata AT zhongmingzhao alinearregressionanddeeplearningapproachfordetectingreliablegeneticalterationsincancerusingdnamethylationandgeneexpressiondata AT sauravmallik linearregressionanddeeplearningapproachfordetectingreliablegeneticalterationsincancerusingdnamethylationandgeneexpressiondata AT soumitaseth linearregressionanddeeplearningapproachfordetectingreliablegeneticalterationsincancerusingdnamethylationandgeneexpressiondata AT tapasbhadra linearregressionanddeeplearningapproachfordetectingreliablegeneticalterationsincancerusingdnamethylationandgeneexpressiondata AT zhongmingzhao linearregressionanddeeplearningapproachfordetectingreliablegeneticalterationsincancerusingdnamethylationandgeneexpressiondata |