Deep learning based DNA:RNA triplex forming potential prediction

Background: Long non-coding RNAs (lncRNAs) can exert functions via forming triplex with DNA. The current methods in predicting the triplex formation mainly rely on mathematic statistic according to the base paring rules. However, these methods have two main limitations: (1) they identify a large num...

Full description

Bibliographic Details
Main Authors: Zhang, Yu, Long, Yahui, Kwoh, Chee Keong
Other Authors: School of Computer Science and Engineering
Format: Journal Article
Language:English
Published: 2021
Subjects:
Online Access:https://hdl.handle.net/10356/145881
Description
Summary:Background: Long non-coding RNAs (lncRNAs) can exert functions via forming triplex with DNA. The current methods in predicting the triplex formation mainly rely on mathematic statistic according to the base paring rules. However, these methods have two main limitations: (1) they identify a large number of triplex-forming lncRNAs, but the limited number of experimentally verified triplex-forming lncRNA indicates that maybe not all of them can form triplex in practice, and (2) their predictions only consider the theoretical relationship while lacking the features from the experimentally verified data. Results: In this work, we develop an integrated program named TriplexFPP (Triplex Forming Potential Prediction), which is the first machine learning model in DNA:RNA triplex prediction. TriplexFPP predicts the most likely triplex-forming lncRNAs and DNA sites based on the experimentally verified data, where the high-level features are learned by the convolutional neural networks. In the fivefold cross validation, the average values of Area Under the ROC curves and PRC curves for removed redundancy triplex-forming lncRNA dataset with threshold 0.8 are 0.9649 and 0.9996, and these two values for triplex DNA sites prediction are 0.8705 and 0.9671, respectively. Besides, we also briefly summarize the cis and trans targeting of triplexes lncRNAs. Conclusions: The TriplexFPP is able to predict the most likely triplex-forming lncRNAs from all the lncRNAs with computationally defined triplex forming capacities and the potential of a DNA site to become a triplex. It may provide insights to the exploration of lncRNA functions.