Stacked Ensemble for Bioactive Molecule Prediction
Bioactive molecular compounds are essential for drug discovery. The biological activity of these compounds needs to be predicted as this is used to determine the drug-target ability. As ineffective drugs are discarded after production, leading to resource and time wastage, it is important to predict...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2019-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8856204/ |
_version_ | 1819171283843153920 |
---|---|
author | Olutomilayo Olayemi Petinrin Faisal Saeed |
author_facet | Olutomilayo Olayemi Petinrin Faisal Saeed |
author_sort | Olutomilayo Olayemi Petinrin |
collection | DOAJ |
description | Bioactive molecular compounds are essential for drug discovery. The biological activity of these compounds needs to be predicted as this is used to determine the drug-target ability. As ineffective drugs are discarded after production, leading to resource and time wastage, it is important to predict bioactive molecules with models having high predictive performance. This study utilizes the stacked ensemble which uses the prediction of multiple base classifiers as features, used to train a meta classifier which makes the final prediction. Using three datasets DS1, DS2, and DS3 gotten from MDL Drug Data Report (MDDR) database, the performance of stacked ensemble was compared to three other ensembles: adaboost, bagging, and vote ensemble, based on different evaluation criteria and also a statistical method, Kendall's W test. The accuracy of Stacked ensemble ranged from 96.7002%, 98.2260% and 94.9007% for the three datasets respectively, although Vote had the best accuracy using dataset DS2 which consist of structurally homogeneous bioactive molecules. Also, using Kendall's W test to rank the ensembles, Stacked ensemble was ranked best with datasets DS1 and DS3, with both having a mean average of 4.00 and an overall level of agreement, W, of 0.986 and 1.000 respectively. Using dataset DS2, it was ranked after Vote and Adaboost with mean average of 2.33 and an overall level of agreement, W of 0.857. Stacked ensemble is recommended for the prediction of heterogeneous bioactive molecules during drug discovery and can also be implemented in other research areas. |
first_indexed | 2024-12-22T19:48:50Z |
format | Article |
id | doaj.art-35ade5a5f1bf41f2bd3097e9d6e75d11 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-22T19:48:50Z |
publishDate | 2019-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-35ade5a5f1bf41f2bd3097e9d6e75d112022-12-21T18:14:37ZengIEEEIEEE Access2169-35362019-01-01715395215395710.1109/ACCESS.2019.29454228856204Stacked Ensemble for Bioactive Molecule PredictionOlutomilayo Olayemi Petinrin0Faisal Saeed1https://orcid.org/0000-0002-2822-1708Information Systems Department, Faculty of Computing, Universiti Teknologi Malaysia, Johor Bahru, MalaysiaInformation Systems Department, Faculty of Computing, Universiti Teknologi Malaysia, Johor Bahru, MalaysiaBioactive molecular compounds are essential for drug discovery. The biological activity of these compounds needs to be predicted as this is used to determine the drug-target ability. As ineffective drugs are discarded after production, leading to resource and time wastage, it is important to predict bioactive molecules with models having high predictive performance. This study utilizes the stacked ensemble which uses the prediction of multiple base classifiers as features, used to train a meta classifier which makes the final prediction. Using three datasets DS1, DS2, and DS3 gotten from MDL Drug Data Report (MDDR) database, the performance of stacked ensemble was compared to three other ensembles: adaboost, bagging, and vote ensemble, based on different evaluation criteria and also a statistical method, Kendall's W test. The accuracy of Stacked ensemble ranged from 96.7002%, 98.2260% and 94.9007% for the three datasets respectively, although Vote had the best accuracy using dataset DS2 which consist of structurally homogeneous bioactive molecules. Also, using Kendall's W test to rank the ensembles, Stacked ensemble was ranked best with datasets DS1 and DS3, with both having a mean average of 4.00 and an overall level of agreement, W, of 0.986 and 1.000 respectively. Using dataset DS2, it was ranked after Vote and Adaboost with mean average of 2.33 and an overall level of agreement, W of 0.857. Stacked ensemble is recommended for the prediction of heterogeneous bioactive molecules during drug discovery and can also be implemented in other research areas.https://ieeexplore.ieee.org/document/8856204/Bioactive molecule predictionchemoinformaticsdrug discoveryensemblestacked ensemble |
spellingShingle | Olutomilayo Olayemi Petinrin Faisal Saeed Stacked Ensemble for Bioactive Molecule Prediction IEEE Access Bioactive molecule prediction chemoinformatics drug discovery ensemble stacked ensemble |
title | Stacked Ensemble for Bioactive Molecule Prediction |
title_full | Stacked Ensemble for Bioactive Molecule Prediction |
title_fullStr | Stacked Ensemble for Bioactive Molecule Prediction |
title_full_unstemmed | Stacked Ensemble for Bioactive Molecule Prediction |
title_short | Stacked Ensemble for Bioactive Molecule Prediction |
title_sort | stacked ensemble for bioactive molecule prediction |
topic | Bioactive molecule prediction chemoinformatics drug discovery ensemble stacked ensemble |
url | https://ieeexplore.ieee.org/document/8856204/ |
work_keys_str_mv | AT olutomilayoolayemipetinrin stackedensembleforbioactivemoleculeprediction AT faisalsaeed stackedensembleforbioactivemoleculeprediction |