On modeling and utilizing chemical compound information with deep learning technologies: A task-oriented approach

A large number of chemical compounds are available in databases such as PubChem and ZINC. However, currently known compounds, though large, represent only a fraction of possible compounds, which is known as chemical space. Many of these compounds in the databases are annotated with properties and as...

Full description

Bibliographic Details
Main Authors: Sangsoo Lim, Sangseon Lee, Yinhua Piao, MinGyu Choi, Dongmin Bang, Jeonghyeon Gu, Sun Kim
Format: Article
Language:English
Published: Elsevier 2022-01-01
Series:Computational and Structural Biotechnology Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2001037022003300
_version_ 1797978199923097600
author Sangsoo Lim
Sangseon Lee
Yinhua Piao
MinGyu Choi
Dongmin Bang
Jeonghyeon Gu
Sun Kim
author_facet Sangsoo Lim
Sangseon Lee
Yinhua Piao
MinGyu Choi
Dongmin Bang
Jeonghyeon Gu
Sun Kim
author_sort Sangsoo Lim
collection DOAJ
description A large number of chemical compounds are available in databases such as PubChem and ZINC. However, currently known compounds, though large, represent only a fraction of possible compounds, which is known as chemical space. Many of these compounds in the databases are annotated with properties and assay data that can be used for drug discovery efforts. For this goal, a number of machine learning algorithms have been developed and recent deep learning technologies can be effectively used to navigate chemical space, especially for unknown chemical compounds, in terms of drug-related tasks. In this article, we survey how deep learning technologies can model and utilize chemical compound information in a task-oriented way by exploiting annotated properties and assay data in the chemical compounds databases. We first compile what kind of tasks are trying to be accomplished by machine learning methods. Then, we survey deep learning technologies to show their modeling power and current applications for accomplishing drug related tasks. Next, we survey deep learning techniques to address the insufficiency issue of annotated data for more effective navigation of chemical space. Chemical compound information alone may not be powerful enough for drug related tasks, thus we survey what kind of information, such as assay and gene expression data, can be used to improve the prediction power of deep learning models. Finally, we conclude this survey with four important newly developed technologies that are yet to be fully incorporated into computational analysis of chemical information.
first_indexed 2024-04-11T05:19:10Z
format Article
id doaj.art-15d9b5055a1b4bf2862b75d40ec9fc2e
institution Directory Open Access Journal
issn 2001-0370
language English
last_indexed 2024-04-11T05:19:10Z
publishDate 2022-01-01
publisher Elsevier
record_format Article
series Computational and Structural Biotechnology Journal
spelling doaj.art-15d9b5055a1b4bf2862b75d40ec9fc2e2022-12-24T04:53:42ZengElsevierComputational and Structural Biotechnology Journal2001-03702022-01-012042884304On modeling and utilizing chemical compound information with deep learning technologies: A task-oriented approachSangsoo Lim0Sangseon Lee1Yinhua Piao2MinGyu Choi3Dongmin Bang4Jeonghyeon Gu5Sun Kim6Bioinformatics Institute, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South KoreaInstitute of Computer Technology, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South KoreaDepartment of Computer Science and Engineering, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South KoreaDepartment of Chemistry, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea; AIGENDRUG Co., Ltd., Gwanak-ro 1, Gwanak-gu, Seoul 08826, South KoreaInterdisciplinary Program in Bioinformatics, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South KoreaInterdisciplinary Program in Artificial Intelligence, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South KoreaDepartment of Computer Science and Engineering, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea; Interdisciplinary Program in Artificial Intelligence, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea; MOGAM Institute for Biomedical Research, Yong-in 16924, South Korea; AIGENDRUG Co., Ltd., Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea; Corresponding author.A large number of chemical compounds are available in databases such as PubChem and ZINC. However, currently known compounds, though large, represent only a fraction of possible compounds, which is known as chemical space. Many of these compounds in the databases are annotated with properties and assay data that can be used for drug discovery efforts. For this goal, a number of machine learning algorithms have been developed and recent deep learning technologies can be effectively used to navigate chemical space, especially for unknown chemical compounds, in terms of drug-related tasks. In this article, we survey how deep learning technologies can model and utilize chemical compound information in a task-oriented way by exploiting annotated properties and assay data in the chemical compounds databases. We first compile what kind of tasks are trying to be accomplished by machine learning methods. Then, we survey deep learning technologies to show their modeling power and current applications for accomplishing drug related tasks. Next, we survey deep learning techniques to address the insufficiency issue of annotated data for more effective navigation of chemical space. Chemical compound information alone may not be powerful enough for drug related tasks, thus we survey what kind of information, such as assay and gene expression data, can be used to improve the prediction power of deep learning models. Finally, we conclude this survey with four important newly developed technologies that are yet to be fully incorporated into computational analysis of chemical information.http://www.sciencedirect.com/science/article/pii/S2001037022003300Chemical spaceDeep learningComputer-aided drug discoveryData augmentationChemical information modeling
spellingShingle Sangsoo Lim
Sangseon Lee
Yinhua Piao
MinGyu Choi
Dongmin Bang
Jeonghyeon Gu
Sun Kim
On modeling and utilizing chemical compound information with deep learning technologies: A task-oriented approach
Computational and Structural Biotechnology Journal
Chemical space
Deep learning
Computer-aided drug discovery
Data augmentation
Chemical information modeling
title On modeling and utilizing chemical compound information with deep learning technologies: A task-oriented approach
title_full On modeling and utilizing chemical compound information with deep learning technologies: A task-oriented approach
title_fullStr On modeling and utilizing chemical compound information with deep learning technologies: A task-oriented approach
title_full_unstemmed On modeling and utilizing chemical compound information with deep learning technologies: A task-oriented approach
title_short On modeling and utilizing chemical compound information with deep learning technologies: A task-oriented approach
title_sort on modeling and utilizing chemical compound information with deep learning technologies a task oriented approach
topic Chemical space
Deep learning
Computer-aided drug discovery
Data augmentation
Chemical information modeling
url http://www.sciencedirect.com/science/article/pii/S2001037022003300
work_keys_str_mv AT sangsoolim onmodelingandutilizingchemicalcompoundinformationwithdeeplearningtechnologiesataskorientedapproach
AT sangseonlee onmodelingandutilizingchemicalcompoundinformationwithdeeplearningtechnologiesataskorientedapproach
AT yinhuapiao onmodelingandutilizingchemicalcompoundinformationwithdeeplearningtechnologiesataskorientedapproach
AT mingyuchoi onmodelingandutilizingchemicalcompoundinformationwithdeeplearningtechnologiesataskorientedapproach
AT dongminbang onmodelingandutilizingchemicalcompoundinformationwithdeeplearningtechnologiesataskorientedapproach
AT jeonghyeongu onmodelingandutilizingchemicalcompoundinformationwithdeeplearningtechnologiesataskorientedapproach
AT sunkim onmodelingandutilizingchemicalcompoundinformationwithdeeplearningtechnologiesataskorientedapproach