A deep learning approach for orphan gene identification in moso bamboo (Phyllostachys edulis) based on the CNN + Transformer model

Abstract Background Orphan gene play an important role in the environmental stresses of many species and their identification is a critical step to understand biological functions. Moso bamboo has high ecological, economic and cultural value. Studies have shown that the growth of moso bamboo is infl...

Full description

Bibliographic Details
Main Authors: Xiaodan Zhang, Jinxiang Xuan, Chensong Yao, Qijuan Gao, Lianglong Wang, Xiu Jin, Shaowen Li
Format: Article
Language:English
Published: BMC 2022-05-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-022-04702-1
_version_ 1811306258915917824
author Xiaodan Zhang
Jinxiang Xuan
Chensong Yao
Qijuan Gao
Lianglong Wang
Xiu Jin
Shaowen Li
author_facet Xiaodan Zhang
Jinxiang Xuan
Chensong Yao
Qijuan Gao
Lianglong Wang
Xiu Jin
Shaowen Li
author_sort Xiaodan Zhang
collection DOAJ
description Abstract Background Orphan gene play an important role in the environmental stresses of many species and their identification is a critical step to understand biological functions. Moso bamboo has high ecological, economic and cultural value. Studies have shown that the growth of moso bamboo is influenced by various stresses. Several traditional methods are time-consuming and inefficient. Hence, the development of efficient and high-accuracy computational methods for predicting orphan genes is of great significance. Results In this paper, we propose a novel deep learning model (CNN + Transformer) for identifying orphan genes in moso bamboo. It uses a convolutional neural network in combination with a transformer neural network to capture k-mer amino acids and features between k-mer amino acids in protein sequences. The experimental results show that the average balance accuracy value of CNN + Transformer on moso bamboo dataset can reach 0.875, and the average Matthews Correlation Coefficient (MCC) value can reach 0.471. For the same testing set, the Balance Accuracy (BA), Geometric Mean (GM), Bookmaker Informedness (BM), and MCC values of the recurrent neural network, long short-term memory, gated recurrent unit, and transformer models are all lower than those of CNN + Transformer, which indicated that the model has the extensive ability for OG identification in moso bamboo. Conclusions CNN + Transformer model is feasible and obtains the credible predictive results. It may also provide valuable references for other related research. As our knowledge, this is the first model to adopt the deep learning techniques for identifying orphan genes in plants.
first_indexed 2024-04-13T08:42:11Z
format Article
id doaj.art-6834f78f92044a6c84d435f2eda2279f
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-13T08:42:11Z
publishDate 2022-05-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-6834f78f92044a6c84d435f2eda2279f2022-12-22T02:53:52ZengBMCBMC Bioinformatics1471-21052022-05-0123111910.1186/s12859-022-04702-1A deep learning approach for orphan gene identification in moso bamboo (Phyllostachys edulis) based on the CNN + Transformer modelXiaodan Zhang0Jinxiang Xuan1Chensong Yao2Qijuan Gao3Lianglong Wang4Xiu Jin5Shaowen Li6Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Anhui Agriculture UniversityAnhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Anhui Agriculture UniversityGraduate School, Anhui Agricultural UniversityAnhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Anhui Agriculture UniversityAnhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Anhui Agriculture UniversityAnhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Anhui Agriculture UniversityAnhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Anhui Agriculture UniversityAbstract Background Orphan gene play an important role in the environmental stresses of many species and their identification is a critical step to understand biological functions. Moso bamboo has high ecological, economic and cultural value. Studies have shown that the growth of moso bamboo is influenced by various stresses. Several traditional methods are time-consuming and inefficient. Hence, the development of efficient and high-accuracy computational methods for predicting orphan genes is of great significance. Results In this paper, we propose a novel deep learning model (CNN + Transformer) for identifying orphan genes in moso bamboo. It uses a convolutional neural network in combination with a transformer neural network to capture k-mer amino acids and features between k-mer amino acids in protein sequences. The experimental results show that the average balance accuracy value of CNN + Transformer on moso bamboo dataset can reach 0.875, and the average Matthews Correlation Coefficient (MCC) value can reach 0.471. For the same testing set, the Balance Accuracy (BA), Geometric Mean (GM), Bookmaker Informedness (BM), and MCC values of the recurrent neural network, long short-term memory, gated recurrent unit, and transformer models are all lower than those of CNN + Transformer, which indicated that the model has the extensive ability for OG identification in moso bamboo. Conclusions CNN + Transformer model is feasible and obtains the credible predictive results. It may also provide valuable references for other related research. As our knowledge, this is the first model to adopt the deep learning techniques for identifying orphan genes in plants.https://doi.org/10.1186/s12859-022-04702-1Orphan genesMoso bambooDeep learningConvolutional neural networkTransformer neural network
spellingShingle Xiaodan Zhang
Jinxiang Xuan
Chensong Yao
Qijuan Gao
Lianglong Wang
Xiu Jin
Shaowen Li
A deep learning approach for orphan gene identification in moso bamboo (Phyllostachys edulis) based on the CNN + Transformer model
BMC Bioinformatics
Orphan genes
Moso bamboo
Deep learning
Convolutional neural network
Transformer neural network
title A deep learning approach for orphan gene identification in moso bamboo (Phyllostachys edulis) based on the CNN + Transformer model
title_full A deep learning approach for orphan gene identification in moso bamboo (Phyllostachys edulis) based on the CNN + Transformer model
title_fullStr A deep learning approach for orphan gene identification in moso bamboo (Phyllostachys edulis) based on the CNN + Transformer model
title_full_unstemmed A deep learning approach for orphan gene identification in moso bamboo (Phyllostachys edulis) based on the CNN + Transformer model
title_short A deep learning approach for orphan gene identification in moso bamboo (Phyllostachys edulis) based on the CNN + Transformer model
title_sort deep learning approach for orphan gene identification in moso bamboo phyllostachys edulis based on the cnn transformer model
topic Orphan genes
Moso bamboo
Deep learning
Convolutional neural network
Transformer neural network
url https://doi.org/10.1186/s12859-022-04702-1
work_keys_str_mv AT xiaodanzhang adeeplearningapproachfororphangeneidentificationinmosobamboophyllostachysedulisbasedonthecnntransformermodel
AT jinxiangxuan adeeplearningapproachfororphangeneidentificationinmosobamboophyllostachysedulisbasedonthecnntransformermodel
AT chensongyao adeeplearningapproachfororphangeneidentificationinmosobamboophyllostachysedulisbasedonthecnntransformermodel
AT qijuangao adeeplearningapproachfororphangeneidentificationinmosobamboophyllostachysedulisbasedonthecnntransformermodel
AT lianglongwang adeeplearningapproachfororphangeneidentificationinmosobamboophyllostachysedulisbasedonthecnntransformermodel
AT xiujin adeeplearningapproachfororphangeneidentificationinmosobamboophyllostachysedulisbasedonthecnntransformermodel
AT shaowenli adeeplearningapproachfororphangeneidentificationinmosobamboophyllostachysedulisbasedonthecnntransformermodel
AT xiaodanzhang deeplearningapproachfororphangeneidentificationinmosobamboophyllostachysedulisbasedonthecnntransformermodel
AT jinxiangxuan deeplearningapproachfororphangeneidentificationinmosobamboophyllostachysedulisbasedonthecnntransformermodel
AT chensongyao deeplearningapproachfororphangeneidentificationinmosobamboophyllostachysedulisbasedonthecnntransformermodel
AT qijuangao deeplearningapproachfororphangeneidentificationinmosobamboophyllostachysedulisbasedonthecnntransformermodel
AT lianglongwang deeplearningapproachfororphangeneidentificationinmosobamboophyllostachysedulisbasedonthecnntransformermodel
AT xiujin deeplearningapproachfororphangeneidentificationinmosobamboophyllostachysedulisbasedonthecnntransformermodel
AT shaowenli deeplearningapproachfororphangeneidentificationinmosobamboophyllostachysedulisbasedonthecnntransformermodel