An Algorithm Framework for Drug-Induced Liver Injury Prediction Based on Genetic Algorithm and Ensemble Learning
In the process of drug discovery, drug-induced liver injury (DILI) is still an active research field and is one of the most common and important issues in toxicity evaluation research. It directly leads to the high wear attrition of the drug. At present, there are a variety of computer algorithms ba...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-05-01
|
Series: | Molecules |
Subjects: | |
Online Access: | https://www.mdpi.com/1420-3049/27/10/3112 |
_version_ | 1827667555979886592 |
---|---|
author | Bowei Yan Xiaona Ye Jing Wang Junshan Han Lianlian Wu Song He Kunhong Liu Xiaochen Bo |
author_facet | Bowei Yan Xiaona Ye Jing Wang Junshan Han Lianlian Wu Song He Kunhong Liu Xiaochen Bo |
author_sort | Bowei Yan |
collection | DOAJ |
description | In the process of drug discovery, drug-induced liver injury (DILI) is still an active research field and is one of the most common and important issues in toxicity evaluation research. It directly leads to the high wear attrition of the drug. At present, there are a variety of computer algorithms based on molecular representations to predict DILI. It is found that a single molecular representation method is insufficient to complete the task of toxicity prediction, and multiple molecular fingerprint fusion methods have been used as model input. In order to solve the problem of high dimensional and unbalanced DILI prediction data, this paper integrates existing datasets and designs a new algorithm framework, Rotation-Ensemble-GA (R-E-GA). The main idea is to find a feature subset with better predictive performance after rotating the fusion vector of high-dimensional molecular representation in the feature space. Then, an Adaboost-type ensemble learning method is integrated into R-E-GA to improve the prediction accuracy. The experimental results show that the performance of R-E-GA is better than other state-of-art algorithms including ensemble learning-based and graph neural network-based methods. Through five-fold cross-validation, the R-E-GA obtains an ACC of 0.77, an F1 score of 0.769, and an AUC of 0.842. |
first_indexed | 2024-03-10T03:19:59Z |
format | Article |
id | doaj.art-3f026b8ed5ce409a96311ae7f8d859fe |
institution | Directory Open Access Journal |
issn | 1420-3049 |
language | English |
last_indexed | 2024-03-10T03:19:59Z |
publishDate | 2022-05-01 |
publisher | MDPI AG |
record_format | Article |
series | Molecules |
spelling | doaj.art-3f026b8ed5ce409a96311ae7f8d859fe2023-11-23T12:21:12ZengMDPI AGMolecules1420-30492022-05-012710311210.3390/molecules27103112An Algorithm Framework for Drug-Induced Liver Injury Prediction Based on Genetic Algorithm and Ensemble LearningBowei Yan0Xiaona Ye1Jing Wang2Junshan Han3Lianlian Wu4Song He5Kunhong Liu6Xiaochen Bo7Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, ChinaSchool of Informatics, Xiamen University, Xiamen 361005, ChinaSchool of Medicine, Tsinghua University, Beijing 100084, ChinaDepartment of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, ChinaInstitute of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072, ChinaDepartment of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, ChinaSchool of Informatics, Xiamen University, Xiamen 361005, ChinaDepartment of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, ChinaIn the process of drug discovery, drug-induced liver injury (DILI) is still an active research field and is one of the most common and important issues in toxicity evaluation research. It directly leads to the high wear attrition of the drug. At present, there are a variety of computer algorithms based on molecular representations to predict DILI. It is found that a single molecular representation method is insufficient to complete the task of toxicity prediction, and multiple molecular fingerprint fusion methods have been used as model input. In order to solve the problem of high dimensional and unbalanced DILI prediction data, this paper integrates existing datasets and designs a new algorithm framework, Rotation-Ensemble-GA (R-E-GA). The main idea is to find a feature subset with better predictive performance after rotating the fusion vector of high-dimensional molecular representation in the feature space. Then, an Adaboost-type ensemble learning method is integrated into R-E-GA to improve the prediction accuracy. The experimental results show that the performance of R-E-GA is better than other state-of-art algorithms including ensemble learning-based and graph neural network-based methods. Through five-fold cross-validation, the R-E-GA obtains an ACC of 0.77, an F1 score of 0.769, and an AUC of 0.842.https://www.mdpi.com/1420-3049/27/10/3112DILIgenetic algorithmensemble learningPCA/MCAQSARmolecular representation |
spellingShingle | Bowei Yan Xiaona Ye Jing Wang Junshan Han Lianlian Wu Song He Kunhong Liu Xiaochen Bo An Algorithm Framework for Drug-Induced Liver Injury Prediction Based on Genetic Algorithm and Ensemble Learning Molecules DILI genetic algorithm ensemble learning PCA/MCA QSAR molecular representation |
title | An Algorithm Framework for Drug-Induced Liver Injury Prediction Based on Genetic Algorithm and Ensemble Learning |
title_full | An Algorithm Framework for Drug-Induced Liver Injury Prediction Based on Genetic Algorithm and Ensemble Learning |
title_fullStr | An Algorithm Framework for Drug-Induced Liver Injury Prediction Based on Genetic Algorithm and Ensemble Learning |
title_full_unstemmed | An Algorithm Framework for Drug-Induced Liver Injury Prediction Based on Genetic Algorithm and Ensemble Learning |
title_short | An Algorithm Framework for Drug-Induced Liver Injury Prediction Based on Genetic Algorithm and Ensemble Learning |
title_sort | algorithm framework for drug induced liver injury prediction based on genetic algorithm and ensemble learning |
topic | DILI genetic algorithm ensemble learning PCA/MCA QSAR molecular representation |
url | https://www.mdpi.com/1420-3049/27/10/3112 |
work_keys_str_mv | AT boweiyan analgorithmframeworkfordruginducedliverinjurypredictionbasedongeneticalgorithmandensemblelearning AT xiaonaye analgorithmframeworkfordruginducedliverinjurypredictionbasedongeneticalgorithmandensemblelearning AT jingwang analgorithmframeworkfordruginducedliverinjurypredictionbasedongeneticalgorithmandensemblelearning AT junshanhan analgorithmframeworkfordruginducedliverinjurypredictionbasedongeneticalgorithmandensemblelearning AT lianlianwu analgorithmframeworkfordruginducedliverinjurypredictionbasedongeneticalgorithmandensemblelearning AT songhe analgorithmframeworkfordruginducedliverinjurypredictionbasedongeneticalgorithmandensemblelearning AT kunhongliu analgorithmframeworkfordruginducedliverinjurypredictionbasedongeneticalgorithmandensemblelearning AT xiaochenbo analgorithmframeworkfordruginducedliverinjurypredictionbasedongeneticalgorithmandensemblelearning AT boweiyan algorithmframeworkfordruginducedliverinjurypredictionbasedongeneticalgorithmandensemblelearning AT xiaonaye algorithmframeworkfordruginducedliverinjurypredictionbasedongeneticalgorithmandensemblelearning AT jingwang algorithmframeworkfordruginducedliverinjurypredictionbasedongeneticalgorithmandensemblelearning AT junshanhan algorithmframeworkfordruginducedliverinjurypredictionbasedongeneticalgorithmandensemblelearning AT lianlianwu algorithmframeworkfordruginducedliverinjurypredictionbasedongeneticalgorithmandensemblelearning AT songhe algorithmframeworkfordruginducedliverinjurypredictionbasedongeneticalgorithmandensemblelearning AT kunhongliu algorithmframeworkfordruginducedliverinjurypredictionbasedongeneticalgorithmandensemblelearning AT xiaochenbo algorithmframeworkfordruginducedliverinjurypredictionbasedongeneticalgorithmandensemblelearning |