Predicting anti-SARS-CoV-2 activities of chemical compounds using machine learning models

To accelerate the discovery of novel drug candidates for Coronavirus Disease 2019 (COVID-19) therapeutics, we reported a series of machine learning (ML)-based models to accurately predict the anti-SARS-CoV-2 activities of screening compounds. We explored 6 popular ML algorithms in combination with 1...

Full description

Bibliographic Details
Main Authors: Beihong Ji, Yuhui Wu, Elena N. Thomas, Jocelyn N. Edwards, Xibing He, Junmei Wang
Format: Article
Language:English
Published: Elsevier 2023-12-01
Series:Artificial Intelligence Chemistry
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2949747723000295
_version_ 1797237852342321152
author Beihong Ji
Yuhui Wu
Elena N. Thomas
Jocelyn N. Edwards
Xibing He
Junmei Wang
author_facet Beihong Ji
Yuhui Wu
Elena N. Thomas
Jocelyn N. Edwards
Xibing He
Junmei Wang
author_sort Beihong Ji
collection DOAJ
description To accelerate the discovery of novel drug candidates for Coronavirus Disease 2019 (COVID-19) therapeutics, we reported a series of machine learning (ML)-based models to accurately predict the anti-SARS-CoV-2 activities of screening compounds. We explored 6 popular ML algorithms in combination with 15 molecular descriptors for molecular structures from 9 screening assays in the COVID-19 OpenData Portal hosted by NCATS. As a result, the models constructed by k-nearest neighbors (KNN) using the molecular descriptor GAFF+RDKit achieved the best overall performance with the highest average accuracy of 0.68 and relatively high average area under the receiver operating characteristic curve of 0.74, better than other ML algorithms. Meanwhile, The KNN model for all assays using GAFF+RDKit descriptor outperformed using other descriptors. The overall performance of our developed models was better than REDIAL-2020 (R). A web server (https://clickff.org/amberweb/covid-19-cp) was developed to enable users to predict anti-SARS-CoV-2 activities of arbitrary compounds using the COVID-19-CP (P) models. Besides the descriptor-based machine learning models, we also developed graph-based Attentive FP (A) models for the 9 assays. We found that the Attentive FP models achieved a comparable performance to that of COVID-19-CP and outperformed the REDIAL-2020 models. The consensus prediction utilizing both COVID-19-CP and Attentive FP can significantly boost the prediction accuracy as assessed by comparing its performance with other three individual models (R, P, A) utilizing the Wilcoxon signed-rank test, thus can ultimately improve the success rate of COVID-19 drug discovery.
first_indexed 2024-04-24T17:26:20Z
format Article
id doaj.art-cf0fd9a14f7147c7a5208e564a12e5cf
institution Directory Open Access Journal
issn 2949-7477
language English
last_indexed 2024-04-24T17:26:20Z
publishDate 2023-12-01
publisher Elsevier
record_format Article
series Artificial Intelligence Chemistry
spelling doaj.art-cf0fd9a14f7147c7a5208e564a12e5cf2024-03-28T06:40:17ZengElsevierArtificial Intelligence Chemistry2949-74772023-12-0112100029Predicting anti-SARS-CoV-2 activities of chemical compounds using machine learning modelsBeihong Ji0Yuhui Wu1Elena N. Thomas2Jocelyn N. Edwards3Xibing He4Junmei Wang5Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USADepartment of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USADepartment of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USADepartment of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USADepartment of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USACorresponding author.; Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USATo accelerate the discovery of novel drug candidates for Coronavirus Disease 2019 (COVID-19) therapeutics, we reported a series of machine learning (ML)-based models to accurately predict the anti-SARS-CoV-2 activities of screening compounds. We explored 6 popular ML algorithms in combination with 15 molecular descriptors for molecular structures from 9 screening assays in the COVID-19 OpenData Portal hosted by NCATS. As a result, the models constructed by k-nearest neighbors (KNN) using the molecular descriptor GAFF+RDKit achieved the best overall performance with the highest average accuracy of 0.68 and relatively high average area under the receiver operating characteristic curve of 0.74, better than other ML algorithms. Meanwhile, The KNN model for all assays using GAFF+RDKit descriptor outperformed using other descriptors. The overall performance of our developed models was better than REDIAL-2020 (R). A web server (https://clickff.org/amberweb/covid-19-cp) was developed to enable users to predict anti-SARS-CoV-2 activities of arbitrary compounds using the COVID-19-CP (P) models. Besides the descriptor-based machine learning models, we also developed graph-based Attentive FP (A) models for the 9 assays. We found that the Attentive FP models achieved a comparable performance to that of COVID-19-CP and outperformed the REDIAL-2020 models. The consensus prediction utilizing both COVID-19-CP and Attentive FP can significantly boost the prediction accuracy as assessed by comparing its performance with other three individual models (R, P, A) utilizing the Wilcoxon signed-rank test, thus can ultimately improve the success rate of COVID-19 drug discovery.http://www.sciencedirect.com/science/article/pii/S2949747723000295COVID-19Machine LearningIn Silico ScreeningAntiviral CompoundsSHAP AnalysisAttentive FP
spellingShingle Beihong Ji
Yuhui Wu
Elena N. Thomas
Jocelyn N. Edwards
Xibing He
Junmei Wang
Predicting anti-SARS-CoV-2 activities of chemical compounds using machine learning models
Artificial Intelligence Chemistry
COVID-19
Machine Learning
In Silico Screening
Antiviral Compounds
SHAP Analysis
Attentive FP
title Predicting anti-SARS-CoV-2 activities of chemical compounds using machine learning models
title_full Predicting anti-SARS-CoV-2 activities of chemical compounds using machine learning models
title_fullStr Predicting anti-SARS-CoV-2 activities of chemical compounds using machine learning models
title_full_unstemmed Predicting anti-SARS-CoV-2 activities of chemical compounds using machine learning models
title_short Predicting anti-SARS-CoV-2 activities of chemical compounds using machine learning models
title_sort predicting anti sars cov 2 activities of chemical compounds using machine learning models
topic COVID-19
Machine Learning
In Silico Screening
Antiviral Compounds
SHAP Analysis
Attentive FP
url http://www.sciencedirect.com/science/article/pii/S2949747723000295
work_keys_str_mv AT beihongji predictingantisarscov2activitiesofchemicalcompoundsusingmachinelearningmodels
AT yuhuiwu predictingantisarscov2activitiesofchemicalcompoundsusingmachinelearningmodels
AT elenanthomas predictingantisarscov2activitiesofchemicalcompoundsusingmachinelearningmodels
AT jocelynnedwards predictingantisarscov2activitiesofchemicalcompoundsusingmachinelearningmodels
AT xibinghe predictingantisarscov2activitiesofchemicalcompoundsusingmachinelearningmodels
AT junmeiwang predictingantisarscov2activitiesofchemicalcompoundsusingmachinelearningmodels