Predicting anti-SARS-CoV-2 activities of chemical compounds using machine learning models
To accelerate the discovery of novel drug candidates for Coronavirus Disease 2019 (COVID-19) therapeutics, we reported a series of machine learning (ML)-based models to accurately predict the anti-SARS-CoV-2 activities of screening compounds. We explored 6 popular ML algorithms in combination with 1...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2023-12-01
|
Series: | Artificial Intelligence Chemistry |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2949747723000295 |
_version_ | 1797237852342321152 |
---|---|
author | Beihong Ji Yuhui Wu Elena N. Thomas Jocelyn N. Edwards Xibing He Junmei Wang |
author_facet | Beihong Ji Yuhui Wu Elena N. Thomas Jocelyn N. Edwards Xibing He Junmei Wang |
author_sort | Beihong Ji |
collection | DOAJ |
description | To accelerate the discovery of novel drug candidates for Coronavirus Disease 2019 (COVID-19) therapeutics, we reported a series of machine learning (ML)-based models to accurately predict the anti-SARS-CoV-2 activities of screening compounds. We explored 6 popular ML algorithms in combination with 15 molecular descriptors for molecular structures from 9 screening assays in the COVID-19 OpenData Portal hosted by NCATS. As a result, the models constructed by k-nearest neighbors (KNN) using the molecular descriptor GAFF+RDKit achieved the best overall performance with the highest average accuracy of 0.68 and relatively high average area under the receiver operating characteristic curve of 0.74, better than other ML algorithms. Meanwhile, The KNN model for all assays using GAFF+RDKit descriptor outperformed using other descriptors. The overall performance of our developed models was better than REDIAL-2020 (R). A web server (https://clickff.org/amberweb/covid-19-cp) was developed to enable users to predict anti-SARS-CoV-2 activities of arbitrary compounds using the COVID-19-CP (P) models. Besides the descriptor-based machine learning models, we also developed graph-based Attentive FP (A) models for the 9 assays. We found that the Attentive FP models achieved a comparable performance to that of COVID-19-CP and outperformed the REDIAL-2020 models. The consensus prediction utilizing both COVID-19-CP and Attentive FP can significantly boost the prediction accuracy as assessed by comparing its performance with other three individual models (R, P, A) utilizing the Wilcoxon signed-rank test, thus can ultimately improve the success rate of COVID-19 drug discovery. |
first_indexed | 2024-04-24T17:26:20Z |
format | Article |
id | doaj.art-cf0fd9a14f7147c7a5208e564a12e5cf |
institution | Directory Open Access Journal |
issn | 2949-7477 |
language | English |
last_indexed | 2024-04-24T17:26:20Z |
publishDate | 2023-12-01 |
publisher | Elsevier |
record_format | Article |
series | Artificial Intelligence Chemistry |
spelling | doaj.art-cf0fd9a14f7147c7a5208e564a12e5cf2024-03-28T06:40:17ZengElsevierArtificial Intelligence Chemistry2949-74772023-12-0112100029Predicting anti-SARS-CoV-2 activities of chemical compounds using machine learning modelsBeihong Ji0Yuhui Wu1Elena N. Thomas2Jocelyn N. Edwards3Xibing He4Junmei Wang5Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USADepartment of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USADepartment of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USADepartment of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USADepartment of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USACorresponding author.; Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USATo accelerate the discovery of novel drug candidates for Coronavirus Disease 2019 (COVID-19) therapeutics, we reported a series of machine learning (ML)-based models to accurately predict the anti-SARS-CoV-2 activities of screening compounds. We explored 6 popular ML algorithms in combination with 15 molecular descriptors for molecular structures from 9 screening assays in the COVID-19 OpenData Portal hosted by NCATS. As a result, the models constructed by k-nearest neighbors (KNN) using the molecular descriptor GAFF+RDKit achieved the best overall performance with the highest average accuracy of 0.68 and relatively high average area under the receiver operating characteristic curve of 0.74, better than other ML algorithms. Meanwhile, The KNN model for all assays using GAFF+RDKit descriptor outperformed using other descriptors. The overall performance of our developed models was better than REDIAL-2020 (R). A web server (https://clickff.org/amberweb/covid-19-cp) was developed to enable users to predict anti-SARS-CoV-2 activities of arbitrary compounds using the COVID-19-CP (P) models. Besides the descriptor-based machine learning models, we also developed graph-based Attentive FP (A) models for the 9 assays. We found that the Attentive FP models achieved a comparable performance to that of COVID-19-CP and outperformed the REDIAL-2020 models. The consensus prediction utilizing both COVID-19-CP and Attentive FP can significantly boost the prediction accuracy as assessed by comparing its performance with other three individual models (R, P, A) utilizing the Wilcoxon signed-rank test, thus can ultimately improve the success rate of COVID-19 drug discovery.http://www.sciencedirect.com/science/article/pii/S2949747723000295COVID-19Machine LearningIn Silico ScreeningAntiviral CompoundsSHAP AnalysisAttentive FP |
spellingShingle | Beihong Ji Yuhui Wu Elena N. Thomas Jocelyn N. Edwards Xibing He Junmei Wang Predicting anti-SARS-CoV-2 activities of chemical compounds using machine learning models Artificial Intelligence Chemistry COVID-19 Machine Learning In Silico Screening Antiviral Compounds SHAP Analysis Attentive FP |
title | Predicting anti-SARS-CoV-2 activities of chemical compounds using machine learning models |
title_full | Predicting anti-SARS-CoV-2 activities of chemical compounds using machine learning models |
title_fullStr | Predicting anti-SARS-CoV-2 activities of chemical compounds using machine learning models |
title_full_unstemmed | Predicting anti-SARS-CoV-2 activities of chemical compounds using machine learning models |
title_short | Predicting anti-SARS-CoV-2 activities of chemical compounds using machine learning models |
title_sort | predicting anti sars cov 2 activities of chemical compounds using machine learning models |
topic | COVID-19 Machine Learning In Silico Screening Antiviral Compounds SHAP Analysis Attentive FP |
url | http://www.sciencedirect.com/science/article/pii/S2949747723000295 |
work_keys_str_mv | AT beihongji predictingantisarscov2activitiesofchemicalcompoundsusingmachinelearningmodels AT yuhuiwu predictingantisarscov2activitiesofchemicalcompoundsusingmachinelearningmodels AT elenanthomas predictingantisarscov2activitiesofchemicalcompoundsusingmachinelearningmodels AT jocelynnedwards predictingantisarscov2activitiesofchemicalcompoundsusingmachinelearningmodels AT xibinghe predictingantisarscov2activitiesofchemicalcompoundsusingmachinelearningmodels AT junmeiwang predictingantisarscov2activitiesofchemicalcompoundsusingmachinelearningmodels |