AI-driven molecular generation of not-patented pharmaceutical compounds using world open patent data
Abstract Developing compounds with novel structures is important for the production of new drugs. From an intellectual perspective, confirming the patent status of newly developed compounds is essential, particularly for pharmaceutical companies. The generation of a large number of compounds has bee...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2023-12-01
|
Series: | Journal of Cheminformatics |
Subjects: | |
Online Access: | https://doi.org/10.1186/s13321-023-00791-z |
_version_ | 1797388106297507840 |
---|---|
author | Yugo Shimizu Masateru Ohta Shoichi Ishida Kei Terayama Masanori Osawa Teruki Honma Kazuyoshi Ikeda |
author_facet | Yugo Shimizu Masateru Ohta Shoichi Ishida Kei Terayama Masanori Osawa Teruki Honma Kazuyoshi Ikeda |
author_sort | Yugo Shimizu |
collection | DOAJ |
description | Abstract Developing compounds with novel structures is important for the production of new drugs. From an intellectual perspective, confirming the patent status of newly developed compounds is essential, particularly for pharmaceutical companies. The generation of a large number of compounds has been made possible because of the recent advances in artificial intelligence (AI). However, confirming the patent status of these generated molecules has been a challenge because there are no free and easy-to-use tools that can be used to determine the novelty of the generated compounds in terms of patents in a timely manner; additionally, there are no appropriate reference databases for pharmaceutical patents in the world. In this study, two public databases, SureChEMBL and Google Patents Public Datasets, were used to create a reference database of drug-related patented compounds using international patent classification. An exact structure search system was constructed using InChIKey and a relational database system to rapidly search for compounds in the reference database. Because drug-related patented compounds are a good source for generative AI to learn useful chemical structures, they were used as the training data. Furthermore, molecule generation was successfully directed by increasing and decreasing the number of generated patented compounds through incorporation of patent status (i.e., patented or not) into learning. The use of patent status enabled generation of novel molecules with high drug-likeness. The generation using generative AI with patent information would help efficiently propose novel compounds in terms of pharmaceutical patents. Scientific contribution: In this study, a new molecule-generation method that takes into account the patent status of molecules, which has rarely been considered but is an important feature in drug discovery, was developed. The method enables the generation of novel molecules based on pharmaceutical patents with high drug-likeness and will help in the efficient development of effective drug compounds. |
first_indexed | 2024-03-08T22:34:58Z |
format | Article |
id | doaj.art-0b04847734da477a9d6dedc784708b85 |
institution | Directory Open Access Journal |
issn | 1758-2946 |
language | English |
last_indexed | 2024-03-08T22:34:58Z |
publishDate | 2023-12-01 |
publisher | BMC |
record_format | Article |
series | Journal of Cheminformatics |
spelling | doaj.art-0b04847734da477a9d6dedc784708b852023-12-17T12:28:30ZengBMCJournal of Cheminformatics1758-29462023-12-0115111110.1186/s13321-023-00791-zAI-driven molecular generation of not-patented pharmaceutical compounds using world open patent dataYugo Shimizu0Masateru Ohta1Shoichi Ishida2Kei Terayama3Masanori Osawa4Teruki Honma5Kazuyoshi Ikeda6HPC- and AI-driven Drug Development Platform Division, RIKEN Center for Computational ScienceHPC- and AI-driven Drug Development Platform Division, RIKEN Center for Computational ScienceGraduate School of Medical Life Science, Yokohama City UniversityGraduate School of Medical Life Science, Yokohama City UniversityDivision of Physics for Life Functions, Keio University Faculty of PharmacyRIKEN Center for Biosystems Dynamics ResearchHPC- and AI-driven Drug Development Platform Division, RIKEN Center for Computational ScienceAbstract Developing compounds with novel structures is important for the production of new drugs. From an intellectual perspective, confirming the patent status of newly developed compounds is essential, particularly for pharmaceutical companies. The generation of a large number of compounds has been made possible because of the recent advances in artificial intelligence (AI). However, confirming the patent status of these generated molecules has been a challenge because there are no free and easy-to-use tools that can be used to determine the novelty of the generated compounds in terms of patents in a timely manner; additionally, there are no appropriate reference databases for pharmaceutical patents in the world. In this study, two public databases, SureChEMBL and Google Patents Public Datasets, were used to create a reference database of drug-related patented compounds using international patent classification. An exact structure search system was constructed using InChIKey and a relational database system to rapidly search for compounds in the reference database. Because drug-related patented compounds are a good source for generative AI to learn useful chemical structures, they were used as the training data. Furthermore, molecule generation was successfully directed by increasing and decreasing the number of generated patented compounds through incorporation of patent status (i.e., patented or not) into learning. The use of patent status enabled generation of novel molecules with high drug-likeness. The generation using generative AI with patent information would help efficiently propose novel compounds in terms of pharmaceutical patents. Scientific contribution: In this study, a new molecule-generation method that takes into account the patent status of molecules, which has rarely been considered but is an important feature in drug discovery, was developed. The method enables the generation of novel molecules based on pharmaceutical patents with high drug-likeness and will help in the efficient development of effective drug compounds.https://doi.org/10.1186/s13321-023-00791-zPatented compoundsDrug discoveryDatabaseCompound searchMolecular generationReward function |
spellingShingle | Yugo Shimizu Masateru Ohta Shoichi Ishida Kei Terayama Masanori Osawa Teruki Honma Kazuyoshi Ikeda AI-driven molecular generation of not-patented pharmaceutical compounds using world open patent data Journal of Cheminformatics Patented compounds Drug discovery Database Compound search Molecular generation Reward function |
title | AI-driven molecular generation of not-patented pharmaceutical compounds using world open patent data |
title_full | AI-driven molecular generation of not-patented pharmaceutical compounds using world open patent data |
title_fullStr | AI-driven molecular generation of not-patented pharmaceutical compounds using world open patent data |
title_full_unstemmed | AI-driven molecular generation of not-patented pharmaceutical compounds using world open patent data |
title_short | AI-driven molecular generation of not-patented pharmaceutical compounds using world open patent data |
title_sort | ai driven molecular generation of not patented pharmaceutical compounds using world open patent data |
topic | Patented compounds Drug discovery Database Compound search Molecular generation Reward function |
url | https://doi.org/10.1186/s13321-023-00791-z |
work_keys_str_mv | AT yugoshimizu aidrivenmoleculargenerationofnotpatentedpharmaceuticalcompoundsusingworldopenpatentdata AT masateruohta aidrivenmoleculargenerationofnotpatentedpharmaceuticalcompoundsusingworldopenpatentdata AT shoichiishida aidrivenmoleculargenerationofnotpatentedpharmaceuticalcompoundsusingworldopenpatentdata AT keiterayama aidrivenmoleculargenerationofnotpatentedpharmaceuticalcompoundsusingworldopenpatentdata AT masanoriosawa aidrivenmoleculargenerationofnotpatentedpharmaceuticalcompoundsusingworldopenpatentdata AT terukihonma aidrivenmoleculargenerationofnotpatentedpharmaceuticalcompoundsusingworldopenpatentdata AT kazuyoshiikeda aidrivenmoleculargenerationofnotpatentedpharmaceuticalcompoundsusingworldopenpatentdata |