Prediction of enhancer-promoter interactions via natural language processing

Abstract Background Precise identification of three-dimensional genome organization, especially enhancer-promoter interactions (EPIs), is important to deciphering gene regulation, cell differentiation and disease mechanisms. Currently, it is a challenging task to distinguish true interactions from o...

Full description

Bibliographic Details
Main Authors:	Wanwen Zeng, Mengmeng Wu, Rui Jiang
Format:	Article
Language:	English
Published:	BMC 2018-05-01
Series:	BMC Genomics
Subjects:	Enhancer-promoter interactions Three-dimensinal interactions Natural language processing Unsupervised learning
Online Access:	http://link.springer.com/article/10.1186/s12864-018-4459-6

_version_	1818109850658275328
author	Wanwen Zeng Mengmeng Wu Rui Jiang
author_facet	Wanwen Zeng Mengmeng Wu Rui Jiang
author_sort	Wanwen Zeng
collection	DOAJ
description	Abstract Background Precise identification of three-dimensional genome organization, especially enhancer-promoter interactions (EPIs), is important to deciphering gene regulation, cell differentiation and disease mechanisms. Currently, it is a challenging task to distinguish true interactions from other nearby non-interacting ones since the power of traditional experimental methods is limited due to low resolution or low throughput. Results We propose a novel computational framework EP2vec to assay three-dimensional genomic interactions. We first extract sequence embedding features, defined as fixed-length vector representations learned from variable-length sequences using an unsupervised deep learning method in natural language processing. Then, we train a classifier to predict EPIs using the learned representations in supervised way. Experimental results demonstrate that EP2vec obtains F1 scores ranging from 0.841~ 0.933 on different datasets, which outperforms existing methods. We prove the robustness of sequence embedding features by carrying out sensitivity analysis. Besides, we identify motifs that represent cell line-specific information through analysis of the learned sequence embedding features by adopting attention mechanism. Last, we show that even superior performance with F1 scores 0.889~ 0.940 can be achieved by combining sequence embedding features and experimental features. Conclusions EP2vec sheds light on feature extraction for DNA sequences of arbitrary lengths and provides a powerful approach for EPIs identification.
first_indexed	2024-12-11T02:37:48Z
format	Article
id	doaj.art-278e9654f5ee492fa2a5f34854a5904f
institution	Directory Open Access Journal
issn	1471-2164
language	English
last_indexed	2024-12-11T02:37:48Z
publishDate	2018-05-01
publisher	BMC
record_format	Article
series	BMC Genomics
spelling	doaj.art-278e9654f5ee492fa2a5f34854a5904f2022-12-22T01:23:41ZengBMCBMC Genomics1471-21642018-05-0119S2132210.1186/s12864-018-4459-6Prediction of enhancer-promoter interactions via natural language processingWanwen Zeng0Mengmeng Wu1Rui Jiang2MOE Key Laboratory of Bioinformatics; Bioinformatics Division and Center for Synthetic & Systems BiologyMOE Key Laboratory of Bioinformatics; Bioinformatics Division and Center for Synthetic & Systems BiologyMOE Key Laboratory of Bioinformatics; Bioinformatics Division and Center for Synthetic & Systems BiologyAbstract Background Precise identification of three-dimensional genome organization, especially enhancer-promoter interactions (EPIs), is important to deciphering gene regulation, cell differentiation and disease mechanisms. Currently, it is a challenging task to distinguish true interactions from other nearby non-interacting ones since the power of traditional experimental methods is limited due to low resolution or low throughput. Results We propose a novel computational framework EP2vec to assay three-dimensional genomic interactions. We first extract sequence embedding features, defined as fixed-length vector representations learned from variable-length sequences using an unsupervised deep learning method in natural language processing. Then, we train a classifier to predict EPIs using the learned representations in supervised way. Experimental results demonstrate that EP2vec obtains F1 scores ranging from 0.841~ 0.933 on different datasets, which outperforms existing methods. We prove the robustness of sequence embedding features by carrying out sensitivity analysis. Besides, we identify motifs that represent cell line-specific information through analysis of the learned sequence embedding features by adopting attention mechanism. Last, we show that even superior performance with F1 scores 0.889~ 0.940 can be achieved by combining sequence embedding features and experimental features. Conclusions EP2vec sheds light on feature extraction for DNA sequences of arbitrary lengths and provides a powerful approach for EPIs identification.http://link.springer.com/article/10.1186/s12864-018-4459-6Enhancer-promoter interactionsThree-dimensinal interactionsNatural language processingUnsupervised learning
spellingShingle	Wanwen Zeng Mengmeng Wu Rui Jiang Prediction of enhancer-promoter interactions via natural language processing BMC Genomics Enhancer-promoter interactions Three-dimensinal interactions Natural language processing Unsupervised learning
title	Prediction of enhancer-promoter interactions via natural language processing
title_full	Prediction of enhancer-promoter interactions via natural language processing
title_fullStr	Prediction of enhancer-promoter interactions via natural language processing
title_full_unstemmed	Prediction of enhancer-promoter interactions via natural language processing
title_short	Prediction of enhancer-promoter interactions via natural language processing
title_sort	prediction of enhancer promoter interactions via natural language processing
topic	Enhancer-promoter interactions Three-dimensinal interactions Natural language processing Unsupervised learning
url	http://link.springer.com/article/10.1186/s12864-018-4459-6
work_keys_str_mv	AT wanwenzeng predictionofenhancerpromoterinteractionsvianaturallanguageprocessing AT mengmengwu predictionofenhancerpromoterinteractionsvianaturallanguageprocessing AT ruijiang predictionofenhancerpromoterinteractionsvianaturallanguageprocessing

Prediction of enhancer-promoter interactions via natural language processing

Similar Items