iEnhancer-GAN: A Deep Learning Framework in Combination with Word Embedding and Sequence Generative Adversarial Net to Identify Enhancers and Their Strength

As critical components of DNA, enhancers can efficiently and specifically manipulate the spatial and temporal regulation of gene transcription. Malfunction or dysregulation of enhancers is implicated in a slew of human pathology. Therefore, identifying enhancers and their strength may provide insigh...

Full description

Bibliographic Details
Main Authors: Runtao Yang, Feng Wu, Chengjin Zhang, Lina Zhang
Format: Article
Language:English
Published: MDPI AG 2021-03-01
Series:International Journal of Molecular Sciences
Subjects:
Online Access:https://www.mdpi.com/1422-0067/22/7/3589
_version_ 1797539466943922176
author Runtao Yang
Feng Wu
Chengjin Zhang
Lina Zhang
author_facet Runtao Yang
Feng Wu
Chengjin Zhang
Lina Zhang
author_sort Runtao Yang
collection DOAJ
description As critical components of DNA, enhancers can efficiently and specifically manipulate the spatial and temporal regulation of gene transcription. Malfunction or dysregulation of enhancers is implicated in a slew of human pathology. Therefore, identifying enhancers and their strength may provide insights into the molecular mechanisms of gene transcription and facilitate the discovery of candidate drug targets. In this paper, a new enhancer and its strength predictor, iEnhancer-GAN, is proposed based on a deep learning framework in combination with the word embedding and sequence generative adversarial net (Seq-GAN). Considering the relatively small training dataset, the Seq-GAN is designed to generate artificial sequences. Given that each functional element in DNA sequences is analogous to a “word” in linguistics, the word segmentation methods are proposed to divide DNA sequences into “words”, and the skip-gram model is employed to transform the “words” into digital vectors. In view of the powerful ability to extract high-level abstraction features, a convolutional neural network (CNN) architecture is constructed to perform the identification tasks, and the word vectors of DNA sequences are vertically concatenated to form the embedding matrices as the input of the CNN. Experimental results demonstrate the effectiveness of the Seq-GAN to expand the training dataset, the possibility of applying word segmentation methods to extract “words” from DNA sequences, the feasibility of implementing the skip-gram model to encode DNA sequences, and the powerful prediction ability of the CNN. Compared with other state-of-the-art methods on the training dataset and independent test dataset, the proposed method achieves a significantly improved overall performance. It is anticipated that the proposed method has a certain promotion effect on enhancer related fields.
first_indexed 2024-03-10T12:46:21Z
format Article
id doaj.art-412484beaa5f4b69a075561d3fbca28d
institution Directory Open Access Journal
issn 1661-6596
1422-0067
language English
last_indexed 2024-03-10T12:46:21Z
publishDate 2021-03-01
publisher MDPI AG
record_format Article
series International Journal of Molecular Sciences
spelling doaj.art-412484beaa5f4b69a075561d3fbca28d2023-11-21T13:26:00ZengMDPI AGInternational Journal of Molecular Sciences1661-65961422-00672021-03-01227358910.3390/ijms22073589iEnhancer-GAN: A Deep Learning Framework in Combination with Word Embedding and Sequence Generative Adversarial Net to Identify Enhancers and Their StrengthRuntao Yang0Feng Wu1Chengjin Zhang2Lina Zhang3School of Mechanical, Electrical and Information Engineering, Shandong University, Weihai 264209, ChinaSchool of Mechanical, Electrical and Information Engineering, Shandong University, Weihai 264209, ChinaSchool of Mechanical, Electrical and Information Engineering, Shandong University, Weihai 264209, ChinaSchool of Mechanical, Electrical and Information Engineering, Shandong University, Weihai 264209, ChinaAs critical components of DNA, enhancers can efficiently and specifically manipulate the spatial and temporal regulation of gene transcription. Malfunction or dysregulation of enhancers is implicated in a slew of human pathology. Therefore, identifying enhancers and their strength may provide insights into the molecular mechanisms of gene transcription and facilitate the discovery of candidate drug targets. In this paper, a new enhancer and its strength predictor, iEnhancer-GAN, is proposed based on a deep learning framework in combination with the word embedding and sequence generative adversarial net (Seq-GAN). Considering the relatively small training dataset, the Seq-GAN is designed to generate artificial sequences. Given that each functional element in DNA sequences is analogous to a “word” in linguistics, the word segmentation methods are proposed to divide DNA sequences into “words”, and the skip-gram model is employed to transform the “words” into digital vectors. In view of the powerful ability to extract high-level abstraction features, a convolutional neural network (CNN) architecture is constructed to perform the identification tasks, and the word vectors of DNA sequences are vertically concatenated to form the embedding matrices as the input of the CNN. Experimental results demonstrate the effectiveness of the Seq-GAN to expand the training dataset, the possibility of applying word segmentation methods to extract “words” from DNA sequences, the feasibility of implementing the skip-gram model to encode DNA sequences, and the powerful prediction ability of the CNN. Compared with other state-of-the-art methods on the training dataset and independent test dataset, the proposed method achieves a significantly improved overall performance. It is anticipated that the proposed method has a certain promotion effect on enhancer related fields.https://www.mdpi.com/1422-0067/22/7/3589enhancerword embeddingsequence generative adversarial netconvolutional neural network
spellingShingle Runtao Yang
Feng Wu
Chengjin Zhang
Lina Zhang
iEnhancer-GAN: A Deep Learning Framework in Combination with Word Embedding and Sequence Generative Adversarial Net to Identify Enhancers and Their Strength
International Journal of Molecular Sciences
enhancer
word embedding
sequence generative adversarial net
convolutional neural network
title iEnhancer-GAN: A Deep Learning Framework in Combination with Word Embedding and Sequence Generative Adversarial Net to Identify Enhancers and Their Strength
title_full iEnhancer-GAN: A Deep Learning Framework in Combination with Word Embedding and Sequence Generative Adversarial Net to Identify Enhancers and Their Strength
title_fullStr iEnhancer-GAN: A Deep Learning Framework in Combination with Word Embedding and Sequence Generative Adversarial Net to Identify Enhancers and Their Strength
title_full_unstemmed iEnhancer-GAN: A Deep Learning Framework in Combination with Word Embedding and Sequence Generative Adversarial Net to Identify Enhancers and Their Strength
title_short iEnhancer-GAN: A Deep Learning Framework in Combination with Word Embedding and Sequence Generative Adversarial Net to Identify Enhancers and Their Strength
title_sort ienhancer gan a deep learning framework in combination with word embedding and sequence generative adversarial net to identify enhancers and their strength
topic enhancer
word embedding
sequence generative adversarial net
convolutional neural network
url https://www.mdpi.com/1422-0067/22/7/3589
work_keys_str_mv AT runtaoyang ienhancerganadeeplearningframeworkincombinationwithwordembeddingandsequencegenerativeadversarialnettoidentifyenhancersandtheirstrength
AT fengwu ienhancerganadeeplearningframeworkincombinationwithwordembeddingandsequencegenerativeadversarialnettoidentifyenhancersandtheirstrength
AT chengjinzhang ienhancerganadeeplearningframeworkincombinationwithwordembeddingandsequencegenerativeadversarialnettoidentifyenhancersandtheirstrength
AT linazhang ienhancerganadeeplearningframeworkincombinationwithwordembeddingandsequencegenerativeadversarialnettoidentifyenhancersandtheirstrength