Predicting Long non-coding RNAs through feature ensemble learning

Abstract Background Many transcripts have been generated due to the development of sequencing technologies, and lncRNA is an important type of transcript. Predicting lncRNAs from transcripts is a challenging and important task. Traditional experimental lncRNA prediction methods are time-consuming an...

Full description

Bibliographic Details
Main Authors: Yanzhen Xu, Xiaohan Zhao, Shuai Liu, Wen Zhang
Format: Article
Language:English
Published: BMC 2020-12-01
Series:BMC Genomics
Subjects:
Online Access:https://doi.org/10.1186/s12864-020-07237-y
_version_ 1818643326736269312
author Yanzhen Xu
Xiaohan Zhao
Shuai Liu
Wen Zhang
author_facet Yanzhen Xu
Xiaohan Zhao
Shuai Liu
Wen Zhang
author_sort Yanzhen Xu
collection DOAJ
description Abstract Background Many transcripts have been generated due to the development of sequencing technologies, and lncRNA is an important type of transcript. Predicting lncRNAs from transcripts is a challenging and important task. Traditional experimental lncRNA prediction methods are time-consuming and labor-intensive. Efficient computational methods for lncRNA prediction are in demand. Results In this paper, we propose two lncRNA prediction methods based on feature ensemble learning strategies named LncPred-IEL and LncPred-ANEL. Specifically, we encode sequences into six different types of features including transcript-specified features and general sequence-derived features. Then we consider two feature ensemble strategies to utilize and integrate the information in different feature types, the iterative ensemble learning (IEL) and the attention network ensemble learning (ANEL). IEL employs a supervised iterative way to ensemble base predictors built on six different types of features. ANEL introduces an attention mechanism-based deep learning model to ensemble features by adaptively learning the weight of individual feature types. Experiments demonstrate that both LncPred-IEL and LncPred-ANEL can effectively separate lncRNAs and other transcripts in feature space. Moreover, comparison experiments demonstrate that LncPred-IEL and LncPred-ANEL outperform several state-of-the-art methods when evaluated by 5-fold cross-validation. Both methods have good performances in cross-species lncRNA prediction. Conclusions LncPred-IEL and LncPred-ANEL are promising lncRNA prediction tools that can effectively utilize and integrate the information in different types of features.
first_indexed 2024-12-16T23:57:11Z
format Article
id doaj.art-cd5d5f88c1cd4374aea62ed3fc109ae0
institution Directory Open Access Journal
issn 1471-2164
language English
last_indexed 2024-12-16T23:57:11Z
publishDate 2020-12-01
publisher BMC
record_format Article
series BMC Genomics
spelling doaj.art-cd5d5f88c1cd4374aea62ed3fc109ae02022-12-21T22:11:10ZengBMCBMC Genomics1471-21642020-12-0121S1311210.1186/s12864-020-07237-yPredicting Long non-coding RNAs through feature ensemble learningYanzhen Xu0Xiaohan Zhao1Shuai Liu2Wen Zhang3College of Informatics, Huazhong Agricultural UniversityCollege of Informatics, Huazhong Agricultural UniversityCollege of Informatics, Huazhong Agricultural UniversityCollege of Informatics, Huazhong Agricultural UniversityAbstract Background Many transcripts have been generated due to the development of sequencing technologies, and lncRNA is an important type of transcript. Predicting lncRNAs from transcripts is a challenging and important task. Traditional experimental lncRNA prediction methods are time-consuming and labor-intensive. Efficient computational methods for lncRNA prediction are in demand. Results In this paper, we propose two lncRNA prediction methods based on feature ensemble learning strategies named LncPred-IEL and LncPred-ANEL. Specifically, we encode sequences into six different types of features including transcript-specified features and general sequence-derived features. Then we consider two feature ensemble strategies to utilize and integrate the information in different feature types, the iterative ensemble learning (IEL) and the attention network ensemble learning (ANEL). IEL employs a supervised iterative way to ensemble base predictors built on six different types of features. ANEL introduces an attention mechanism-based deep learning model to ensemble features by adaptively learning the weight of individual feature types. Experiments demonstrate that both LncPred-IEL and LncPred-ANEL can effectively separate lncRNAs and other transcripts in feature space. Moreover, comparison experiments demonstrate that LncPred-IEL and LncPred-ANEL outperform several state-of-the-art methods when evaluated by 5-fold cross-validation. Both methods have good performances in cross-species lncRNA prediction. Conclusions LncPred-IEL and LncPred-ANEL are promising lncRNA prediction tools that can effectively utilize and integrate the information in different types of features.https://doi.org/10.1186/s12864-020-07237-ylncRNA predictionAttention mechanismFeature ensemble learning
spellingShingle Yanzhen Xu
Xiaohan Zhao
Shuai Liu
Wen Zhang
Predicting Long non-coding RNAs through feature ensemble learning
BMC Genomics
lncRNA prediction
Attention mechanism
Feature ensemble learning
title Predicting Long non-coding RNAs through feature ensemble learning
title_full Predicting Long non-coding RNAs through feature ensemble learning
title_fullStr Predicting Long non-coding RNAs through feature ensemble learning
title_full_unstemmed Predicting Long non-coding RNAs through feature ensemble learning
title_short Predicting Long non-coding RNAs through feature ensemble learning
title_sort predicting long non coding rnas through feature ensemble learning
topic lncRNA prediction
Attention mechanism
Feature ensemble learning
url https://doi.org/10.1186/s12864-020-07237-y
work_keys_str_mv AT yanzhenxu predictinglongnoncodingrnasthroughfeatureensemblelearning
AT xiaohanzhao predictinglongnoncodingrnasthroughfeatureensemblelearning
AT shuailiu predictinglongnoncodingrnasthroughfeatureensemblelearning
AT wenzhang predictinglongnoncodingrnasthroughfeatureensemblelearning