A New Approach Based on Feature Selection of Light Gradient Boosting Machine and Transformer to Predict circRNA-Disease Associations
Circular RNA (circRNA) is a type of single-stranded RNA with a closed circular structure. Recent studies have shown that circRNA has a relatively more stable structure than its linear counterparts. The circRNA has become a biological marker in medicine and plays a crucial role in disease prediction....
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10124200/ |
_version_ | 1797823435734253568 |
---|---|
author | Chen Ma Yuhong Chi Donglai Hao Xiongfei Ji |
author_facet | Chen Ma Yuhong Chi Donglai Hao Xiongfei Ji |
author_sort | Chen Ma |
collection | DOAJ |
description | Circular RNA (circRNA) is a type of single-stranded RNA with a closed circular structure. Recent studies have shown that circRNA has a relatively more stable structure than its linear counterparts. The circRNA has become a biological marker in medicine and plays a crucial role in disease prediction. However, traditional biological experiments are often time-consuming and laborious. More researchers are taking computational approaches to predict the circRNA-disease associations more rapidly and reliably. In this paper, we propose a novel method for predicting the circRNA-disease associations based on the feature selection using Light Gradient Boosting Machine (LightGBM) and a self-attention neural network-Transformer (LGFRCDA). Firstly, the histogram-based decision tree algorithm in LightGBM is used to discretize the continuous floating-point features in circRNA-disease into the histogram of integer numbers. While traversing samples, the difference between histograms is used to optimize the calculation, greatly improving the construction speed. Then a leaf-wise algorithm is employed to calculate the node with the maximum split gain, resulting in the final feature vector. Finally, these features are sorted in order of importance and introduced into the Transformer for information fusion and prediction. Our study demonstrates that after feature processing and dimension reduction, LGFRCDA achieved a prediction accuracy of 95.44% for AUC (Area Under the receiver operating characteristic Curve), which is 3.11% higher than the latest algorithms for the same dataset. We also conducted a search in published literature to cross-validate the predicted result. Out of the top 15 circRNA-disease pairs predicted by the LGFRCDA model, 13 were confirmed by existing literature. These results indicate that the proposed model is suitable for predicting circRNA-disease associations and can provide reliable candidates for biological experiments. |
first_indexed | 2024-03-13T10:23:59Z |
format | Article |
id | doaj.art-919b23271485410a83a974c01006a35f |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-03-13T10:23:59Z |
publishDate | 2023-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-919b23271485410a83a974c01006a35f2023-05-19T23:00:53ZengIEEEIEEE Access2169-35362023-01-0111471874720110.1109/ACCESS.2023.327596710124200A New Approach Based on Feature Selection of Light Gradient Boosting Machine and Transformer to Predict circRNA-Disease AssociationsChen Ma0https://orcid.org/0000-0002-0598-127XYuhong Chi1https://orcid.org/0000-0001-7291-9137Donglai Hao2Xiongfei Ji3School of Computer Science, Xijing University, Xi’an, Shaanxi, ChinaSchool of Computer Science, Xijing University, Xi’an, Shaanxi, ChinaSchool of Computer Science, Xijing University, Xi’an, Shaanxi, ChinaThe 20th Research Institute of China Electronics Technology Group Corporation, Xi’an, Shaanxi, ChinaCircular RNA (circRNA) is a type of single-stranded RNA with a closed circular structure. Recent studies have shown that circRNA has a relatively more stable structure than its linear counterparts. The circRNA has become a biological marker in medicine and plays a crucial role in disease prediction. However, traditional biological experiments are often time-consuming and laborious. More researchers are taking computational approaches to predict the circRNA-disease associations more rapidly and reliably. In this paper, we propose a novel method for predicting the circRNA-disease associations based on the feature selection using Light Gradient Boosting Machine (LightGBM) and a self-attention neural network-Transformer (LGFRCDA). Firstly, the histogram-based decision tree algorithm in LightGBM is used to discretize the continuous floating-point features in circRNA-disease into the histogram of integer numbers. While traversing samples, the difference between histograms is used to optimize the calculation, greatly improving the construction speed. Then a leaf-wise algorithm is employed to calculate the node with the maximum split gain, resulting in the final feature vector. Finally, these features are sorted in order of importance and introduced into the Transformer for information fusion and prediction. Our study demonstrates that after feature processing and dimension reduction, LGFRCDA achieved a prediction accuracy of 95.44% for AUC (Area Under the receiver operating characteristic Curve), which is 3.11% higher than the latest algorithms for the same dataset. We also conducted a search in published literature to cross-validate the predicted result. Out of the top 15 circRNA-disease pairs predicted by the LGFRCDA model, 13 were confirmed by existing literature. These results indicate that the proposed model is suitable for predicting circRNA-disease associations and can provide reliable candidates for biological experiments.https://ieeexplore.ieee.org/document/10124200/circRNA-diseasecircRNAlight gradient boosting machineself-attention neural networktransformerfeature selection |
spellingShingle | Chen Ma Yuhong Chi Donglai Hao Xiongfei Ji A New Approach Based on Feature Selection of Light Gradient Boosting Machine and Transformer to Predict circRNA-Disease Associations IEEE Access circRNA-disease circRNA light gradient boosting machine self-attention neural network transformer feature selection |
title | A New Approach Based on Feature Selection of Light Gradient Boosting Machine and Transformer to Predict circRNA-Disease Associations |
title_full | A New Approach Based on Feature Selection of Light Gradient Boosting Machine and Transformer to Predict circRNA-Disease Associations |
title_fullStr | A New Approach Based on Feature Selection of Light Gradient Boosting Machine and Transformer to Predict circRNA-Disease Associations |
title_full_unstemmed | A New Approach Based on Feature Selection of Light Gradient Boosting Machine and Transformer to Predict circRNA-Disease Associations |
title_short | A New Approach Based on Feature Selection of Light Gradient Boosting Machine and Transformer to Predict circRNA-Disease Associations |
title_sort | new approach based on feature selection of light gradient boosting machine and transformer to predict circrna disease associations |
topic | circRNA-disease circRNA light gradient boosting machine self-attention neural network transformer feature selection |
url | https://ieeexplore.ieee.org/document/10124200/ |
work_keys_str_mv | AT chenma anewapproachbasedonfeatureselectionoflightgradientboostingmachineandtransformertopredictcircrnadiseaseassociations AT yuhongchi anewapproachbasedonfeatureselectionoflightgradientboostingmachineandtransformertopredictcircrnadiseaseassociations AT donglaihao anewapproachbasedonfeatureselectionoflightgradientboostingmachineandtransformertopredictcircrnadiseaseassociations AT xiongfeiji anewapproachbasedonfeatureselectionoflightgradientboostingmachineandtransformertopredictcircrnadiseaseassociations AT chenma newapproachbasedonfeatureselectionoflightgradientboostingmachineandtransformertopredictcircrnadiseaseassociations AT yuhongchi newapproachbasedonfeatureselectionoflightgradientboostingmachineandtransformertopredictcircrnadiseaseassociations AT donglaihao newapproachbasedonfeatureselectionoflightgradientboostingmachineandtransformertopredictcircrnadiseaseassociations AT xiongfeiji newapproachbasedonfeatureselectionoflightgradientboostingmachineandtransformertopredictcircrnadiseaseassociations |