A New Approach Based on Feature Selection of Light Gradient Boosting Machine and Transformer to Predict circRNA-Disease Associations

Circular RNA (circRNA) is a type of single-stranded RNA with a closed circular structure. Recent studies have shown that circRNA has a relatively more stable structure than its linear counterparts. The circRNA has become a biological marker in medicine and plays a crucial role in disease prediction....

Full description

Bibliographic Details
Main Authors: Chen Ma, Yuhong Chi, Donglai Hao, Xiongfei Ji
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10124200/
_version_ 1797823435734253568
author Chen Ma
Yuhong Chi
Donglai Hao
Xiongfei Ji
author_facet Chen Ma
Yuhong Chi
Donglai Hao
Xiongfei Ji
author_sort Chen Ma
collection DOAJ
description Circular RNA (circRNA) is a type of single-stranded RNA with a closed circular structure. Recent studies have shown that circRNA has a relatively more stable structure than its linear counterparts. The circRNA has become a biological marker in medicine and plays a crucial role in disease prediction. However, traditional biological experiments are often time-consuming and laborious. More researchers are taking computational approaches to predict the circRNA-disease associations more rapidly and reliably. In this paper, we propose a novel method for predicting the circRNA-disease associations based on the feature selection using Light Gradient Boosting Machine (LightGBM) and a self-attention neural network-Transformer (LGFRCDA). Firstly, the histogram-based decision tree algorithm in LightGBM is used to discretize the continuous floating-point features in circRNA-disease into the histogram of integer numbers. While traversing samples, the difference between histograms is used to optimize the calculation, greatly improving the construction speed. Then a leaf-wise algorithm is employed to calculate the node with the maximum split gain, resulting in the final feature vector. Finally, these features are sorted in order of importance and introduced into the Transformer for information fusion and prediction. Our study demonstrates that after feature processing and dimension reduction, LGFRCDA achieved a prediction accuracy of 95.44% for AUC (Area Under the receiver operating characteristic Curve), which is 3.11% higher than the latest algorithms for the same dataset. We also conducted a search in published literature to cross-validate the predicted result. Out of the top 15 circRNA-disease pairs predicted by the LGFRCDA model, 13 were confirmed by existing literature. These results indicate that the proposed model is suitable for predicting circRNA-disease associations and can provide reliable candidates for biological experiments.
first_indexed 2024-03-13T10:23:59Z
format Article
id doaj.art-919b23271485410a83a974c01006a35f
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-03-13T10:23:59Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-919b23271485410a83a974c01006a35f2023-05-19T23:00:53ZengIEEEIEEE Access2169-35362023-01-0111471874720110.1109/ACCESS.2023.327596710124200A New Approach Based on Feature Selection of Light Gradient Boosting Machine and Transformer to Predict circRNA-Disease AssociationsChen Ma0https://orcid.org/0000-0002-0598-127XYuhong Chi1https://orcid.org/0000-0001-7291-9137Donglai Hao2Xiongfei Ji3School of Computer Science, Xijing University, Xi’an, Shaanxi, ChinaSchool of Computer Science, Xijing University, Xi’an, Shaanxi, ChinaSchool of Computer Science, Xijing University, Xi’an, Shaanxi, ChinaThe 20th Research Institute of China Electronics Technology Group Corporation, Xi’an, Shaanxi, ChinaCircular RNA (circRNA) is a type of single-stranded RNA with a closed circular structure. Recent studies have shown that circRNA has a relatively more stable structure than its linear counterparts. The circRNA has become a biological marker in medicine and plays a crucial role in disease prediction. However, traditional biological experiments are often time-consuming and laborious. More researchers are taking computational approaches to predict the circRNA-disease associations more rapidly and reliably. In this paper, we propose a novel method for predicting the circRNA-disease associations based on the feature selection using Light Gradient Boosting Machine (LightGBM) and a self-attention neural network-Transformer (LGFRCDA). Firstly, the histogram-based decision tree algorithm in LightGBM is used to discretize the continuous floating-point features in circRNA-disease into the histogram of integer numbers. While traversing samples, the difference between histograms is used to optimize the calculation, greatly improving the construction speed. Then a leaf-wise algorithm is employed to calculate the node with the maximum split gain, resulting in the final feature vector. Finally, these features are sorted in order of importance and introduced into the Transformer for information fusion and prediction. Our study demonstrates that after feature processing and dimension reduction, LGFRCDA achieved a prediction accuracy of 95.44% for AUC (Area Under the receiver operating characteristic Curve), which is 3.11% higher than the latest algorithms for the same dataset. We also conducted a search in published literature to cross-validate the predicted result. Out of the top 15 circRNA-disease pairs predicted by the LGFRCDA model, 13 were confirmed by existing literature. These results indicate that the proposed model is suitable for predicting circRNA-disease associations and can provide reliable candidates for biological experiments.https://ieeexplore.ieee.org/document/10124200/circRNA-diseasecircRNAlight gradient boosting machineself-attention neural networktransformerfeature selection
spellingShingle Chen Ma
Yuhong Chi
Donglai Hao
Xiongfei Ji
A New Approach Based on Feature Selection of Light Gradient Boosting Machine and Transformer to Predict circRNA-Disease Associations
IEEE Access
circRNA-disease
circRNA
light gradient boosting machine
self-attention neural network
transformer
feature selection
title A New Approach Based on Feature Selection of Light Gradient Boosting Machine and Transformer to Predict circRNA-Disease Associations
title_full A New Approach Based on Feature Selection of Light Gradient Boosting Machine and Transformer to Predict circRNA-Disease Associations
title_fullStr A New Approach Based on Feature Selection of Light Gradient Boosting Machine and Transformer to Predict circRNA-Disease Associations
title_full_unstemmed A New Approach Based on Feature Selection of Light Gradient Boosting Machine and Transformer to Predict circRNA-Disease Associations
title_short A New Approach Based on Feature Selection of Light Gradient Boosting Machine and Transformer to Predict circRNA-Disease Associations
title_sort new approach based on feature selection of light gradient boosting machine and transformer to predict circrna disease associations
topic circRNA-disease
circRNA
light gradient boosting machine
self-attention neural network
transformer
feature selection
url https://ieeexplore.ieee.org/document/10124200/
work_keys_str_mv AT chenma anewapproachbasedonfeatureselectionoflightgradientboostingmachineandtransformertopredictcircrnadiseaseassociations
AT yuhongchi anewapproachbasedonfeatureselectionoflightgradientboostingmachineandtransformertopredictcircrnadiseaseassociations
AT donglaihao anewapproachbasedonfeatureselectionoflightgradientboostingmachineandtransformertopredictcircrnadiseaseassociations
AT xiongfeiji anewapproachbasedonfeatureselectionoflightgradientboostingmachineandtransformertopredictcircrnadiseaseassociations
AT chenma newapproachbasedonfeatureselectionoflightgradientboostingmachineandtransformertopredictcircrnadiseaseassociations
AT yuhongchi newapproachbasedonfeatureselectionoflightgradientboostingmachineandtransformertopredictcircrnadiseaseassociations
AT donglaihao newapproachbasedonfeatureselectionoflightgradientboostingmachineandtransformertopredictcircrnadiseaseassociations
AT xiongfeiji newapproachbasedonfeatureselectionoflightgradientboostingmachineandtransformertopredictcircrnadiseaseassociations