GPU-accelerated Chemical Similarity Assessment for Large Scale Databases

The assessment of chemical similarity between molecules is a basic operation in chemoinformatics, a computational area concerning with the manipulation of chemical structural information. Comparing molecules is the basis for a wide range of applications such as searching in chemical databases, train...

Full description

Bibliographic Details
Main Authors:	Maggioni, Marco, Santambrogio, Marco Domenico, Liang, Jie
Other Authors:	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format:	Article
Language:	en_US
Published:	Elsevier B.V. 2014
Online Access:	http://hdl.handle.net/1721.1/92298

_version_	1826210028937281536
author	Maggioni, Marco Santambrogio, Marco Domenico Liang, Jie
author2	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Maggioni, Marco Santambrogio, Marco Domenico Liang, Jie
author_sort	Maggioni, Marco
collection	MIT
description	The assessment of chemical similarity between molecules is a basic operation in chemoinformatics, a computational area concerning with the manipulation of chemical structural information. Comparing molecules is the basis for a wide range of applications such as searching in chemical databases, training prediction models for virtual screening or aggregating clusters of similar compounds. However, currently available multimillion databases represent a challenge for conventional chemoinformatics algorithms raising the necessity for faster similarity methods. In this paper, we extensively analyze the advantages of using many-core architectures for calculating some commonly-used chemical similarity coe_cients such as Tanimoto, Dice or Cosine. Our aim is to provide a wide-breath proof-of-concept regarding the usefulness of GPU architectures to chemoinformatics, a class of computing problems still uncovered. In our work, we present a general GPU algorithm for all-to-all chemical comparisons considering both binary fingerprints and floating point descriptors as molecule representation. Subsequently, we adopt optimization techniques to minimize global memory accesses and to further improve e_ciency. We test the proposed algorithm on different experimental setups, a laptop with a low-end GPU and a desktop with a more performant GPU. In the former case, we obtain a 4-to-6-fold speed-up over a single-core implementation for fingerprints and a 4-to-7-fold speed-up for descriptors. In the latter case, we respectively obtain a 195-to-206-fold speed-up and a 100-to-328-fold speed-up.
first_indexed	2024-09-23T14:40:11Z
format	Article
id	mit-1721.1/92298
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T14:40:11Z
publishDate	2014
publisher	Elsevier B.V.
record_format	dspace
spelling	mit-1721.1/922982022-09-29T10:04:33Z GPU-accelerated Chemical Similarity Assessment for Large Scale Databases Maggioni, Marco Santambrogio, Marco Domenico Liang, Jie Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Santambrogio, Marco Domenico The assessment of chemical similarity between molecules is a basic operation in chemoinformatics, a computational area concerning with the manipulation of chemical structural information. Comparing molecules is the basis for a wide range of applications such as searching in chemical databases, training prediction models for virtual screening or aggregating clusters of similar compounds. However, currently available multimillion databases represent a challenge for conventional chemoinformatics algorithms raising the necessity for faster similarity methods. In this paper, we extensively analyze the advantages of using many-core architectures for calculating some commonly-used chemical similarity coe_cients such as Tanimoto, Dice or Cosine. Our aim is to provide a wide-breath proof-of-concept regarding the usefulness of GPU architectures to chemoinformatics, a class of computing problems still uncovered. In our work, we present a general GPU algorithm for all-to-all chemical comparisons considering both binary fingerprints and floating point descriptors as molecule representation. Subsequently, we adopt optimization techniques to minimize global memory accesses and to further improve e_ciency. We test the proposed algorithm on different experimental setups, a laptop with a low-end GPU and a desktop with a more performant GPU. In the former case, we obtain a 4-to-6-fold speed-up over a single-core implementation for fingerprints and a 4-to-7-fold speed-up for descriptors. In the latter case, we respectively obtain a 195-to-206-fold speed-up and a 100-to-328-fold speed-up. National Institutes of Health (U.S.) (grant GM079804) National Institutes of Health (U.S.) (grant GM086145) 2014-12-12T19:16:42Z 2014-12-12T19:16:42Z 2011 Article http://purl.org/eprint/type/JournalArticle 18770509 http://hdl.handle.net/1721.1/92298 Maggioni, Marco, Marco Domenico Santambrogio, and Jie Liang. “GPU-Accelerated Chemical Similarity Assessment for Large Scale Databases.” Procedia Computer Science 4 (2011): 2007–2016. © 2011 Elsevier B.V. en_US http://dx.doi.org/10.1016/j.procs.2011.04.219 Procedia Computer Science Creative Commons Attribution http://creativecommons.org/licenses/by-nc-nd/3.0/ application/pdf Elsevier B.V. Elsevier
spellingShingle	Maggioni, Marco Santambrogio, Marco Domenico Liang, Jie GPU-accelerated Chemical Similarity Assessment for Large Scale Databases
title	GPU-accelerated Chemical Similarity Assessment for Large Scale Databases
title_full	GPU-accelerated Chemical Similarity Assessment for Large Scale Databases
title_fullStr	GPU-accelerated Chemical Similarity Assessment for Large Scale Databases
title_full_unstemmed	GPU-accelerated Chemical Similarity Assessment for Large Scale Databases
title_short	GPU-accelerated Chemical Similarity Assessment for Large Scale Databases
title_sort	gpu accelerated chemical similarity assessment for large scale databases
url	http://hdl.handle.net/1721.1/92298
work_keys_str_mv	AT maggionimarco gpuacceleratedchemicalsimilarityassessmentforlargescaledatabases AT santambrogiomarcodomenico gpuacceleratedchemicalsimilarityassessmentforlargescaledatabases AT liangjie gpuacceleratedchemicalsimilarityassessmentforlargescaledatabases

GPU-accelerated Chemical Similarity Assessment for Large Scale Databases

Similar Items