AI-accelerated protein-ligand docking for SARS-CoV-2 is 100-fold faster with no significant change in detection

Abstract Protein-ligand docking is a computational method for identifying drug leads. The method is capable of narrowing a vast library of compounds down to a tractable size for downstream simulation or experimental testing and is widely used in drug discovery. While there has been progress in accel...

Full description

Bibliographic Details
Main Authors:	Austin Clyde, Xuefeng Liu, Thomas Brettin, Hyunseung Yoo, Alexander Partin, Yadu Babuji, Ben Blaiszik, Jamaludin Mohd-Yusof, Andre Merzky, Matteo Turilli, Shantenu Jha, Arvind Ramanathan, Rick Stevens
Format:	Article
Language:	English
Published:	Nature Portfolio 2023-02-01
Series:	Scientific Reports
Online Access:	https://doi.org/10.1038/s41598-023-28785-9

_version_	1811166013890232320
author	Austin Clyde Xuefeng Liu Thomas Brettin Hyunseung Yoo Alexander Partin Yadu Babuji Ben Blaiszik Jamaludin Mohd-Yusof Andre Merzky Matteo Turilli Shantenu Jha Arvind Ramanathan Rick Stevens
author_facet	Austin Clyde Xuefeng Liu Thomas Brettin Hyunseung Yoo Alexander Partin Yadu Babuji Ben Blaiszik Jamaludin Mohd-Yusof Andre Merzky Matteo Turilli Shantenu Jha Arvind Ramanathan Rick Stevens
author_sort	Austin Clyde
collection	DOAJ
description	Abstract Protein-ligand docking is a computational method for identifying drug leads. The method is capable of narrowing a vast library of compounds down to a tractable size for downstream simulation or experimental testing and is widely used in drug discovery. While there has been progress in accelerating scoring of compounds with artificial intelligence, few works have bridged these successes back to the virtual screening community in terms of utility and forward-looking development. We demonstrate the power of high-speed ML models by scoring 1 billion molecules in under a day (50 k predictions per GPU seconds). We showcase a workflow for docking utilizing surrogate AI-based models as a pre-filter to a standard docking workflow. Our workflow is ten times faster at screening a library of compounds than the standard technique, with an error rate less than 0.01% of detecting the underlying best scoring 0.1% of compounds. Our analysis of the speedup explains that another order of magnitude speedup must come from model accuracy rather than computing speed. In order to drive another order of magnitude of acceleration, we share a benchmark dataset consisting of 200 million 3D complex structures and 2D structure scores across a consistent set of 13 million “in-stock” molecules over 15 receptors, or binding sites, across the SARS-CoV-2 proteome. We believe this is strong evidence for the community to begin focusing on improving the accuracy of surrogate models to improve the ability to screen massive compound libraries 100 × or even 1000 × faster than current techniques and reduce missing top hits. The technique outlined aims to be a fast drop-in replacement for docking for screening billion-scale molecular libraries.
first_indexed	2024-04-10T15:45:31Z
format	Article
id	doaj.art-9350cd1448aa41b5828f39539f4ae94c
institution	Directory Open Access Journal
issn	2045-2322
language	English
last_indexed	2024-04-10T15:45:31Z
publishDate	2023-02-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj.art-9350cd1448aa41b5828f39539f4ae94c2023-02-12T12:09:40ZengNature PortfolioScientific Reports2045-23222023-02-0113111410.1038/s41598-023-28785-9AI-accelerated protein-ligand docking for SARS-CoV-2 is 100-fold faster with no significant change in detectionAustin Clyde0Xuefeng Liu1Thomas Brettin2Hyunseung Yoo3Alexander Partin4Yadu Babuji5Ben Blaiszik6Jamaludin Mohd-Yusof7Andre Merzky8Matteo Turilli9Shantenu Jha10Arvind Ramanathan11Rick Stevens12Argonne National Laboratory, Data Science and Learning DivisionDepartment of Computer Science, University of ChicagoDepartment of Computer Science, University of ChicagoArgonne National Laboratory, Data Science and Learning DivisionArgonne National Laboratory, Data Science and Learning DivisionDepartment of Computer Science, University of ChicagoArgonne National Laboratory, Data Science and Learning DivisionLos Alamos National Laboratory, Computer, Computational, and Statistical SciencesDepartment of Electrical and Computer Engineering, Rutgers UniversityDepartment of Electrical and Computer Engineering, Rutgers UniversityDepartment of Electrical and Computer Engineering, Rutgers UniversityArgonne National Laboratory, Data Science and Learning DivisionDepartment of Computer Science, University of ChicagoAbstract Protein-ligand docking is a computational method for identifying drug leads. The method is capable of narrowing a vast library of compounds down to a tractable size for downstream simulation or experimental testing and is widely used in drug discovery. While there has been progress in accelerating scoring of compounds with artificial intelligence, few works have bridged these successes back to the virtual screening community in terms of utility and forward-looking development. We demonstrate the power of high-speed ML models by scoring 1 billion molecules in under a day (50 k predictions per GPU seconds). We showcase a workflow for docking utilizing surrogate AI-based models as a pre-filter to a standard docking workflow. Our workflow is ten times faster at screening a library of compounds than the standard technique, with an error rate less than 0.01% of detecting the underlying best scoring 0.1% of compounds. Our analysis of the speedup explains that another order of magnitude speedup must come from model accuracy rather than computing speed. In order to drive another order of magnitude of acceleration, we share a benchmark dataset consisting of 200 million 3D complex structures and 2D structure scores across a consistent set of 13 million “in-stock” molecules over 15 receptors, or binding sites, across the SARS-CoV-2 proteome. We believe this is strong evidence for the community to begin focusing on improving the accuracy of surrogate models to improve the ability to screen massive compound libraries 100 × or even 1000 × faster than current techniques and reduce missing top hits. The technique outlined aims to be a fast drop-in replacement for docking for screening billion-scale molecular libraries.https://doi.org/10.1038/s41598-023-28785-9
spellingShingle	Austin Clyde Xuefeng Liu Thomas Brettin Hyunseung Yoo Alexander Partin Yadu Babuji Ben Blaiszik Jamaludin Mohd-Yusof Andre Merzky Matteo Turilli Shantenu Jha Arvind Ramanathan Rick Stevens AI-accelerated protein-ligand docking for SARS-CoV-2 is 100-fold faster with no significant change in detection Scientific Reports
title	AI-accelerated protein-ligand docking for SARS-CoV-2 is 100-fold faster with no significant change in detection
title_full	AI-accelerated protein-ligand docking for SARS-CoV-2 is 100-fold faster with no significant change in detection
title_fullStr	AI-accelerated protein-ligand docking for SARS-CoV-2 is 100-fold faster with no significant change in detection
title_full_unstemmed	AI-accelerated protein-ligand docking for SARS-CoV-2 is 100-fold faster with no significant change in detection
title_short	AI-accelerated protein-ligand docking for SARS-CoV-2 is 100-fold faster with no significant change in detection
title_sort	ai accelerated protein ligand docking for sars cov 2 is 100 fold faster with no significant change in detection
url	https://doi.org/10.1038/s41598-023-28785-9
work_keys_str_mv	AT austinclyde aiacceleratedproteinliganddockingforsarscov2is100foldfasterwithnosignificantchangeindetection AT xuefengliu aiacceleratedproteinliganddockingforsarscov2is100foldfasterwithnosignificantchangeindetection AT thomasbrettin aiacceleratedproteinliganddockingforsarscov2is100foldfasterwithnosignificantchangeindetection AT hyunseungyoo aiacceleratedproteinliganddockingforsarscov2is100foldfasterwithnosignificantchangeindetection AT alexanderpartin aiacceleratedproteinliganddockingforsarscov2is100foldfasterwithnosignificantchangeindetection AT yadubabuji aiacceleratedproteinliganddockingforsarscov2is100foldfasterwithnosignificantchangeindetection AT benblaiszik aiacceleratedproteinliganddockingforsarscov2is100foldfasterwithnosignificantchangeindetection AT jamaludinmohdyusof aiacceleratedproteinliganddockingforsarscov2is100foldfasterwithnosignificantchangeindetection AT andremerzky aiacceleratedproteinliganddockingforsarscov2is100foldfasterwithnosignificantchangeindetection AT matteoturilli aiacceleratedproteinliganddockingforsarscov2is100foldfasterwithnosignificantchangeindetection AT shantenujha aiacceleratedproteinliganddockingforsarscov2is100foldfasterwithnosignificantchangeindetection AT arvindramanathan aiacceleratedproteinliganddockingforsarscov2is100foldfasterwithnosignificantchangeindetection AT rickstevens aiacceleratedproteinliganddockingforsarscov2is100foldfasterwithnosignificantchangeindetection

AI-accelerated protein-ligand docking for SARS-CoV-2 is 100-fold faster with no significant change in detection

Similar Items