Stacked Ensemble for Bioactive Molecule Prediction

Bioactive molecular compounds are essential for drug discovery. The biological activity of these compounds needs to be predicted as this is used to determine the drug-target ability. As ineffective drugs are discarded after production, leading to resource and time wastage, it is important to predict...

Full description

Bibliographic Details
Main Authors: Olutomilayo Olayemi Petinrin, Faisal Saeed
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8856204/
_version_ 1819171283843153920
author Olutomilayo Olayemi Petinrin
Faisal Saeed
author_facet Olutomilayo Olayemi Petinrin
Faisal Saeed
author_sort Olutomilayo Olayemi Petinrin
collection DOAJ
description Bioactive molecular compounds are essential for drug discovery. The biological activity of these compounds needs to be predicted as this is used to determine the drug-target ability. As ineffective drugs are discarded after production, leading to resource and time wastage, it is important to predict bioactive molecules with models having high predictive performance. This study utilizes the stacked ensemble which uses the prediction of multiple base classifiers as features, used to train a meta classifier which makes the final prediction. Using three datasets DS1, DS2, and DS3 gotten from MDL Drug Data Report (MDDR) database, the performance of stacked ensemble was compared to three other ensembles: adaboost, bagging, and vote ensemble, based on different evaluation criteria and also a statistical method, Kendall's W test. The accuracy of Stacked ensemble ranged from 96.7002%, 98.2260% and 94.9007% for the three datasets respectively, although Vote had the best accuracy using dataset DS2 which consist of structurally homogeneous bioactive molecules. Also, using Kendall's W test to rank the ensembles, Stacked ensemble was ranked best with datasets DS1 and DS3, with both having a mean average of 4.00 and an overall level of agreement, W, of 0.986 and 1.000 respectively. Using dataset DS2, it was ranked after Vote and Adaboost with mean average of 2.33 and an overall level of agreement, W of 0.857. Stacked ensemble is recommended for the prediction of heterogeneous bioactive molecules during drug discovery and can also be implemented in other research areas.
first_indexed 2024-12-22T19:48:50Z
format Article
id doaj.art-35ade5a5f1bf41f2bd3097e9d6e75d11
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-22T19:48:50Z
publishDate 2019-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-35ade5a5f1bf41f2bd3097e9d6e75d112022-12-21T18:14:37ZengIEEEIEEE Access2169-35362019-01-01715395215395710.1109/ACCESS.2019.29454228856204Stacked Ensemble for Bioactive Molecule PredictionOlutomilayo Olayemi Petinrin0Faisal Saeed1https://orcid.org/0000-0002-2822-1708Information Systems Department, Faculty of Computing, Universiti Teknologi Malaysia, Johor Bahru, MalaysiaInformation Systems Department, Faculty of Computing, Universiti Teknologi Malaysia, Johor Bahru, MalaysiaBioactive molecular compounds are essential for drug discovery. The biological activity of these compounds needs to be predicted as this is used to determine the drug-target ability. As ineffective drugs are discarded after production, leading to resource and time wastage, it is important to predict bioactive molecules with models having high predictive performance. This study utilizes the stacked ensemble which uses the prediction of multiple base classifiers as features, used to train a meta classifier which makes the final prediction. Using three datasets DS1, DS2, and DS3 gotten from MDL Drug Data Report (MDDR) database, the performance of stacked ensemble was compared to three other ensembles: adaboost, bagging, and vote ensemble, based on different evaluation criteria and also a statistical method, Kendall's W test. The accuracy of Stacked ensemble ranged from 96.7002%, 98.2260% and 94.9007% for the three datasets respectively, although Vote had the best accuracy using dataset DS2 which consist of structurally homogeneous bioactive molecules. Also, using Kendall's W test to rank the ensembles, Stacked ensemble was ranked best with datasets DS1 and DS3, with both having a mean average of 4.00 and an overall level of agreement, W, of 0.986 and 1.000 respectively. Using dataset DS2, it was ranked after Vote and Adaboost with mean average of 2.33 and an overall level of agreement, W of 0.857. Stacked ensemble is recommended for the prediction of heterogeneous bioactive molecules during drug discovery and can also be implemented in other research areas.https://ieeexplore.ieee.org/document/8856204/Bioactive molecule predictionchemoinformaticsdrug discoveryensemblestacked ensemble
spellingShingle Olutomilayo Olayemi Petinrin
Faisal Saeed
Stacked Ensemble for Bioactive Molecule Prediction
IEEE Access
Bioactive molecule prediction
chemoinformatics
drug discovery
ensemble
stacked ensemble
title Stacked Ensemble for Bioactive Molecule Prediction
title_full Stacked Ensemble for Bioactive Molecule Prediction
title_fullStr Stacked Ensemble for Bioactive Molecule Prediction
title_full_unstemmed Stacked Ensemble for Bioactive Molecule Prediction
title_short Stacked Ensemble for Bioactive Molecule Prediction
title_sort stacked ensemble for bioactive molecule prediction
topic Bioactive molecule prediction
chemoinformatics
drug discovery
ensemble
stacked ensemble
url https://ieeexplore.ieee.org/document/8856204/
work_keys_str_mv AT olutomilayoolayemipetinrin stackedensembleforbioactivemoleculeprediction
AT faisalsaeed stackedensembleforbioactivemoleculeprediction