An Algorithm Framework for Drug-Induced Liver Injury Prediction Based on Genetic Algorithm and Ensemble Learning

In the process of drug discovery, drug-induced liver injury (DILI) is still an active research field and is one of the most common and important issues in toxicity evaluation research. It directly leads to the high wear attrition of the drug. At present, there are a variety of computer algorithms ba...

Full description

Bibliographic Details
Main Authors: Bowei Yan, Xiaona Ye, Jing Wang, Junshan Han, Lianlian Wu, Song He, Kunhong Liu, Xiaochen Bo
Format: Article
Language:English
Published: MDPI AG 2022-05-01
Series:Molecules
Subjects:
Online Access:https://www.mdpi.com/1420-3049/27/10/3112
_version_ 1797497493092564992
author Bowei Yan
Xiaona Ye
Jing Wang
Junshan Han
Lianlian Wu
Song He
Kunhong Liu
Xiaochen Bo
author_facet Bowei Yan
Xiaona Ye
Jing Wang
Junshan Han
Lianlian Wu
Song He
Kunhong Liu
Xiaochen Bo
author_sort Bowei Yan
collection DOAJ
description In the process of drug discovery, drug-induced liver injury (DILI) is still an active research field and is one of the most common and important issues in toxicity evaluation research. It directly leads to the high wear attrition of the drug. At present, there are a variety of computer algorithms based on molecular representations to predict DILI. It is found that a single molecular representation method is insufficient to complete the task of toxicity prediction, and multiple molecular fingerprint fusion methods have been used as model input. In order to solve the problem of high dimensional and unbalanced DILI prediction data, this paper integrates existing datasets and designs a new algorithm framework, Rotation-Ensemble-GA (R-E-GA). The main idea is to find a feature subset with better predictive performance after rotating the fusion vector of high-dimensional molecular representation in the feature space. Then, an Adaboost-type ensemble learning method is integrated into R-E-GA to improve the prediction accuracy. The experimental results show that the performance of R-E-GA is better than other state-of-art algorithms including ensemble learning-based and graph neural network-based methods. Through five-fold cross-validation, the R-E-GA obtains an ACC of 0.77, an F1 score of 0.769, and an AUC of 0.842.
first_indexed 2024-03-10T03:19:59Z
format Article
id doaj.art-3f026b8ed5ce409a96311ae7f8d859fe
institution Directory Open Access Journal
issn 1420-3049
language English
last_indexed 2024-03-10T03:19:59Z
publishDate 2022-05-01
publisher MDPI AG
record_format Article
series Molecules
spelling doaj.art-3f026b8ed5ce409a96311ae7f8d859fe2023-11-23T12:21:12ZengMDPI AGMolecules1420-30492022-05-012710311210.3390/molecules27103112An Algorithm Framework for Drug-Induced Liver Injury Prediction Based on Genetic Algorithm and Ensemble LearningBowei Yan0Xiaona Ye1Jing Wang2Junshan Han3Lianlian Wu4Song He5Kunhong Liu6Xiaochen Bo7Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, ChinaSchool of Informatics, Xiamen University, Xiamen 361005, ChinaSchool of Medicine, Tsinghua University, Beijing 100084, ChinaDepartment of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, ChinaInstitute of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072, ChinaDepartment of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, ChinaSchool of Informatics, Xiamen University, Xiamen 361005, ChinaDepartment of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, ChinaIn the process of drug discovery, drug-induced liver injury (DILI) is still an active research field and is one of the most common and important issues in toxicity evaluation research. It directly leads to the high wear attrition of the drug. At present, there are a variety of computer algorithms based on molecular representations to predict DILI. It is found that a single molecular representation method is insufficient to complete the task of toxicity prediction, and multiple molecular fingerprint fusion methods have been used as model input. In order to solve the problem of high dimensional and unbalanced DILI prediction data, this paper integrates existing datasets and designs a new algorithm framework, Rotation-Ensemble-GA (R-E-GA). The main idea is to find a feature subset with better predictive performance after rotating the fusion vector of high-dimensional molecular representation in the feature space. Then, an Adaboost-type ensemble learning method is integrated into R-E-GA to improve the prediction accuracy. The experimental results show that the performance of R-E-GA is better than other state-of-art algorithms including ensemble learning-based and graph neural network-based methods. Through five-fold cross-validation, the R-E-GA obtains an ACC of 0.77, an F1 score of 0.769, and an AUC of 0.842.https://www.mdpi.com/1420-3049/27/10/3112DILIgenetic algorithmensemble learningPCA/MCAQSARmolecular representation
spellingShingle Bowei Yan
Xiaona Ye
Jing Wang
Junshan Han
Lianlian Wu
Song He
Kunhong Liu
Xiaochen Bo
An Algorithm Framework for Drug-Induced Liver Injury Prediction Based on Genetic Algorithm and Ensemble Learning
Molecules
DILI
genetic algorithm
ensemble learning
PCA/MCA
QSAR
molecular representation
title An Algorithm Framework for Drug-Induced Liver Injury Prediction Based on Genetic Algorithm and Ensemble Learning
title_full An Algorithm Framework for Drug-Induced Liver Injury Prediction Based on Genetic Algorithm and Ensemble Learning
title_fullStr An Algorithm Framework for Drug-Induced Liver Injury Prediction Based on Genetic Algorithm and Ensemble Learning
title_full_unstemmed An Algorithm Framework for Drug-Induced Liver Injury Prediction Based on Genetic Algorithm and Ensemble Learning
title_short An Algorithm Framework for Drug-Induced Liver Injury Prediction Based on Genetic Algorithm and Ensemble Learning
title_sort algorithm framework for drug induced liver injury prediction based on genetic algorithm and ensemble learning
topic DILI
genetic algorithm
ensemble learning
PCA/MCA
QSAR
molecular representation
url https://www.mdpi.com/1420-3049/27/10/3112
work_keys_str_mv AT boweiyan analgorithmframeworkfordruginducedliverinjurypredictionbasedongeneticalgorithmandensemblelearning
AT xiaonaye analgorithmframeworkfordruginducedliverinjurypredictionbasedongeneticalgorithmandensemblelearning
AT jingwang analgorithmframeworkfordruginducedliverinjurypredictionbasedongeneticalgorithmandensemblelearning
AT junshanhan analgorithmframeworkfordruginducedliverinjurypredictionbasedongeneticalgorithmandensemblelearning
AT lianlianwu analgorithmframeworkfordruginducedliverinjurypredictionbasedongeneticalgorithmandensemblelearning
AT songhe analgorithmframeworkfordruginducedliverinjurypredictionbasedongeneticalgorithmandensemblelearning
AT kunhongliu analgorithmframeworkfordruginducedliverinjurypredictionbasedongeneticalgorithmandensemblelearning
AT xiaochenbo analgorithmframeworkfordruginducedliverinjurypredictionbasedongeneticalgorithmandensemblelearning
AT boweiyan algorithmframeworkfordruginducedliverinjurypredictionbasedongeneticalgorithmandensemblelearning
AT xiaonaye algorithmframeworkfordruginducedliverinjurypredictionbasedongeneticalgorithmandensemblelearning
AT jingwang algorithmframeworkfordruginducedliverinjurypredictionbasedongeneticalgorithmandensemblelearning
AT junshanhan algorithmframeworkfordruginducedliverinjurypredictionbasedongeneticalgorithmandensemblelearning
AT lianlianwu algorithmframeworkfordruginducedliverinjurypredictionbasedongeneticalgorithmandensemblelearning
AT songhe algorithmframeworkfordruginducedliverinjurypredictionbasedongeneticalgorithmandensemblelearning
AT kunhongliu algorithmframeworkfordruginducedliverinjurypredictionbasedongeneticalgorithmandensemblelearning
AT xiaochenbo algorithmframeworkfordruginducedliverinjurypredictionbasedongeneticalgorithmandensemblelearning