Genome-wide prediction of cis-regulatory regions using supervised deep learning methods

Abstract Background In the human genome, 98% of DNA sequences are non-protein-coding regions that were previously disregarded as junk DNA. In fact, non-coding regions host a variety of cis-regulatory regions which precisely control the expression of genes. Thus, Identifying active cis-regulatory reg...

Full description

Bibliographic Details
Main Authors: Yifeng Li, Wenqiang Shi, Wyeth W. Wasserman
Format: Article
Language:English
Published: BMC 2018-05-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-018-2187-1
_version_ 1818129272565399552
author Yifeng Li
Wenqiang Shi
Wyeth W. Wasserman
author_facet Yifeng Li
Wenqiang Shi
Wyeth W. Wasserman
author_sort Yifeng Li
collection DOAJ
description Abstract Background In the human genome, 98% of DNA sequences are non-protein-coding regions that were previously disregarded as junk DNA. In fact, non-coding regions host a variety of cis-regulatory regions which precisely control the expression of genes. Thus, Identifying active cis-regulatory regions in the human genome is critical for understanding gene regulation and assessing the impact of genetic variation on phenotype. The developments of high-throughput sequencing and machine learning technologies make it possible to predict cis-regulatory regions genome wide. Results Based on rich data resources such as the Encyclopedia of DNA Elements (ENCODE) and the Functional Annotation of the Mammalian Genome (FANTOM) projects, we introduce DECRES based on supervised deep learning approaches for the identification of enhancer and promoter regions in the human genome. Due to their ability to discover patterns in large and complex data, the introduction of deep learning methods enables a significant advance in our knowledge of the genomic locations of cis-regulatory regions. Using models for well-characterized cell lines, we identify key experimental features that contribute to the predictive performance. Applying DECRES, we delineate locations of 300,000 candidate enhancers genome wide (6.8% of the genome, of which 40,000 are supported by bidirectional transcription data), and 26,000 candidate promoters (0.6% of the genome). Conclusion The predicted annotations of cis-regulatory regions will provide broad utility for genome interpretation from functional genomics to clinical applications. The DECRES model demonstrates potentials of deep learning technologies when combined with high-throughput sequencing data, and inspires the development of other advanced neural network models for further improvement of genome annotations.
first_indexed 2024-12-11T07:46:31Z
format Article
id doaj.art-cf7888513cf24443b01e82c9177e727d
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-11T07:46:31Z
publishDate 2018-05-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-cf7888513cf24443b01e82c9177e727d2022-12-22T01:15:27ZengBMCBMC Bioinformatics1471-21052018-05-0119111410.1186/s12859-018-2187-1Genome-wide prediction of cis-regulatory regions using supervised deep learning methodsYifeng Li0Wenqiang Shi1Wyeth W. Wasserman2Centre for Molecular Medicine and Therapeutics, BC Children’s Hospital Research Institute, Department of Medical Genetics, University of British ColumbiaCentre for Molecular Medicine and Therapeutics, BC Children’s Hospital Research Institute, Department of Medical Genetics, University of British ColumbiaCentre for Molecular Medicine and Therapeutics, BC Children’s Hospital Research Institute, Department of Medical Genetics, University of British ColumbiaAbstract Background In the human genome, 98% of DNA sequences are non-protein-coding regions that were previously disregarded as junk DNA. In fact, non-coding regions host a variety of cis-regulatory regions which precisely control the expression of genes. Thus, Identifying active cis-regulatory regions in the human genome is critical for understanding gene regulation and assessing the impact of genetic variation on phenotype. The developments of high-throughput sequencing and machine learning technologies make it possible to predict cis-regulatory regions genome wide. Results Based on rich data resources such as the Encyclopedia of DNA Elements (ENCODE) and the Functional Annotation of the Mammalian Genome (FANTOM) projects, we introduce DECRES based on supervised deep learning approaches for the identification of enhancer and promoter regions in the human genome. Due to their ability to discover patterns in large and complex data, the introduction of deep learning methods enables a significant advance in our knowledge of the genomic locations of cis-regulatory regions. Using models for well-characterized cell lines, we identify key experimental features that contribute to the predictive performance. Applying DECRES, we delineate locations of 300,000 candidate enhancers genome wide (6.8% of the genome, of which 40,000 are supported by bidirectional transcription data), and 26,000 candidate promoters (0.6% of the genome). Conclusion The predicted annotations of cis-regulatory regions will provide broad utility for genome interpretation from functional genomics to clinical applications. The DECRES model demonstrates potentials of deep learning technologies when combined with high-throughput sequencing data, and inspires the development of other advanced neural network models for further improvement of genome annotations.http://link.springer.com/article/10.1186/s12859-018-2187-1cis-regulatory regionEnhancerPromoterDeep learning
spellingShingle Yifeng Li
Wenqiang Shi
Wyeth W. Wasserman
Genome-wide prediction of cis-regulatory regions using supervised deep learning methods
BMC Bioinformatics
cis-regulatory region
Enhancer
Promoter
Deep learning
title Genome-wide prediction of cis-regulatory regions using supervised deep learning methods
title_full Genome-wide prediction of cis-regulatory regions using supervised deep learning methods
title_fullStr Genome-wide prediction of cis-regulatory regions using supervised deep learning methods
title_full_unstemmed Genome-wide prediction of cis-regulatory regions using supervised deep learning methods
title_short Genome-wide prediction of cis-regulatory regions using supervised deep learning methods
title_sort genome wide prediction of cis regulatory regions using supervised deep learning methods
topic cis-regulatory region
Enhancer
Promoter
Deep learning
url http://link.springer.com/article/10.1186/s12859-018-2187-1
work_keys_str_mv AT yifengli genomewidepredictionofcisregulatoryregionsusingsuperviseddeeplearningmethods
AT wenqiangshi genomewidepredictionofcisregulatoryregionsusingsuperviseddeeplearningmethods
AT wyethwwasserman genomewidepredictionofcisregulatoryregionsusingsuperviseddeeplearningmethods