regCNN: identifying Drosophila genome-wide cis-regulatory modules via integrating the local patterns in epigenetic marks and transcription factor binding motifs

Transcription regulation in metazoa is controlled by the binding events of transcription factors (TFs) or regulatory proteins on specific modular DNA regulatory sequences called cis-regulatory modules (CRMs). Understanding the distributions of CRMs on a genomic scale is essential for constructing th...

Full description

Bibliographic Details
Main Authors: Tzu-Hsien Yang, Ya-Chiao Yang, Kai-Chi Tu
Format: Article
Language:English
Published: Elsevier 2022-01-01
Series:Computational and Structural Biotechnology Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2001037021005249
_version_ 1797978287854583808
author Tzu-Hsien Yang
Ya-Chiao Yang
Kai-Chi Tu
author_facet Tzu-Hsien Yang
Ya-Chiao Yang
Kai-Chi Tu
author_sort Tzu-Hsien Yang
collection DOAJ
description Transcription regulation in metazoa is controlled by the binding events of transcription factors (TFs) or regulatory proteins on specific modular DNA regulatory sequences called cis-regulatory modules (CRMs). Understanding the distributions of CRMs on a genomic scale is essential for constructing the metazoan transcriptional regulatory networks that help diagnose genetic disorders. While traditional reporter-assay CRM identification approaches can provide an in-depth understanding of functions of some CRM, these methods are usually cost-inefficient and low-throughput. It is generally believed that by integrating diverse genomic data, reliable CRM predictions can be made. Hence, researchers often first resort to computational algorithms for genome-wide CRM screening before specific experiments. However, current existing in silico methods for searching potential CRMs were restricted by low sensitivity, poor prediction accuracy, or high computation time from TFBS composition combinatorial complexity. To overcome these obstacles, we designed a novel CRM identification pipeline called regCNN by considering the base-by-base local patterns in TF binding motifs and epigenetic profiles. On the test set, regCNN shows an accuracy/auROC of 84.5%/92.5% in CRM identification. And by further considering local patterns in epigenetic profiles and TF binding motifs, it can accomplish 4.7% (92.5%–87.8%) improvement in the auROC value over the average value-based pure multi-layer perceptron model. We also demonstrated that regCNN outperforms all currently available tools by at least 11.3% in auROC values. Finally, regCNN is verified to be robust against its resizing window hyperparameter in dealing with the variable lengths of CRMs. The model of regCNN can be downloaded athttp://cobisHSS0.im.nuk.edu.tw/regCNN/.
first_indexed 2024-04-11T05:20:23Z
format Article
id doaj.art-696415bb0ad24547af12619b8c79e9ff
institution Directory Open Access Journal
issn 2001-0370
language English
last_indexed 2024-04-11T05:20:23Z
publishDate 2022-01-01
publisher Elsevier
record_format Article
series Computational and Structural Biotechnology Journal
spelling doaj.art-696415bb0ad24547af12619b8c79e9ff2022-12-24T04:50:57ZengElsevierComputational and Structural Biotechnology Journal2001-03702022-01-0120296308regCNN: identifying Drosophila genome-wide cis-regulatory modules via integrating the local patterns in epigenetic marks and transcription factor binding motifsTzu-Hsien Yang0Ya-Chiao Yang1Kai-Chi Tu2Corresponding author.; Department of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, TaiwanDepartment of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, TaiwanDepartment of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, TaiwanTranscription regulation in metazoa is controlled by the binding events of transcription factors (TFs) or regulatory proteins on specific modular DNA regulatory sequences called cis-regulatory modules (CRMs). Understanding the distributions of CRMs on a genomic scale is essential for constructing the metazoan transcriptional regulatory networks that help diagnose genetic disorders. While traditional reporter-assay CRM identification approaches can provide an in-depth understanding of functions of some CRM, these methods are usually cost-inefficient and low-throughput. It is generally believed that by integrating diverse genomic data, reliable CRM predictions can be made. Hence, researchers often first resort to computational algorithms for genome-wide CRM screening before specific experiments. However, current existing in silico methods for searching potential CRMs were restricted by low sensitivity, poor prediction accuracy, or high computation time from TFBS composition combinatorial complexity. To overcome these obstacles, we designed a novel CRM identification pipeline called regCNN by considering the base-by-base local patterns in TF binding motifs and epigenetic profiles. On the test set, regCNN shows an accuracy/auROC of 84.5%/92.5% in CRM identification. And by further considering local patterns in epigenetic profiles and TF binding motifs, it can accomplish 4.7% (92.5%–87.8%) improvement in the auROC value over the average value-based pure multi-layer perceptron model. We also demonstrated that regCNN outperforms all currently available tools by at least 11.3% in auROC values. Finally, regCNN is verified to be robust against its resizing window hyperparameter in dealing with the variable lengths of CRMs. The model of regCNN can be downloaded athttp://cobisHSS0.im.nuk.edu.tw/regCNN/.http://www.sciencedirect.com/science/article/pii/S2001037021005249cis-regulatory modulesTranscriptional regulationEpigenetic regulationTranscriptional factor binding sites
spellingShingle Tzu-Hsien Yang
Ya-Chiao Yang
Kai-Chi Tu
regCNN: identifying Drosophila genome-wide cis-regulatory modules via integrating the local patterns in epigenetic marks and transcription factor binding motifs
Computational and Structural Biotechnology Journal
cis-regulatory modules
Transcriptional regulation
Epigenetic regulation
Transcriptional factor binding sites
title regCNN: identifying Drosophila genome-wide cis-regulatory modules via integrating the local patterns in epigenetic marks and transcription factor binding motifs
title_full regCNN: identifying Drosophila genome-wide cis-regulatory modules via integrating the local patterns in epigenetic marks and transcription factor binding motifs
title_fullStr regCNN: identifying Drosophila genome-wide cis-regulatory modules via integrating the local patterns in epigenetic marks and transcription factor binding motifs
title_full_unstemmed regCNN: identifying Drosophila genome-wide cis-regulatory modules via integrating the local patterns in epigenetic marks and transcription factor binding motifs
title_short regCNN: identifying Drosophila genome-wide cis-regulatory modules via integrating the local patterns in epigenetic marks and transcription factor binding motifs
title_sort regcnn identifying drosophila genome wide cis regulatory modules via integrating the local patterns in epigenetic marks and transcription factor binding motifs
topic cis-regulatory modules
Transcriptional regulation
Epigenetic regulation
Transcriptional factor binding sites
url http://www.sciencedirect.com/science/article/pii/S2001037021005249
work_keys_str_mv AT tzuhsienyang regcnnidentifyingdrosophilagenomewidecisregulatorymodulesviaintegratingthelocalpatternsinepigeneticmarksandtranscriptionfactorbindingmotifs
AT yachiaoyang regcnnidentifyingdrosophilagenomewidecisregulatorymodulesviaintegratingthelocalpatternsinepigeneticmarksandtranscriptionfactorbindingmotifs
AT kaichitu regcnnidentifyingdrosophilagenomewidecisregulatorymodulesviaintegratingthelocalpatternsinepigeneticmarksandtranscriptionfactorbindingmotifs